Google DeepMind wants to know if chatbots are just virtue signaling


With coding and math, you have clear-cut, correct answers that you can check, William Isaac, a research scientist at Google DeepMind, told me when I met him and Julia Haas, a fellow research scientist at the firm, for an exclusive preview of their work, which is published in Nature today. That’s not the case for moral questions, which typically have a range of acceptable answers: “Morality is an important capability but hard to evaluate,” says Isaac.

“In the moral domain, there’s no right and wrong,” adds Haas. “But it’s not by any means a free-for-all. There are better answers and there are worse answers.”

The researchers have identified several key challenges and suggested ways to address them. But it is more a wish list than a set of ready-made solutions. “They do a nice job of bringing together different perspectives,” says Vera Demberg, who studies LLMs at Saarland University in Germany.

Better than “The Ethicist”

A number of studies have shown that LLMs can show remarkable moral competence. One study published last year found that people in the US scored ethical advice from OpenAI’s GPT-4o as being more moral, trustworthy, thoughtful, and correct than advice given by the (human) writer of “The Ethicist,” a popular New York Times advice column.  

The problem is that it is hard to unpick whether such behaviors are a performance—mimicking a memorized response, say—or evidence that there is in fact some kind of moral reasoning taking place inside the model. In other words, is it virtue or virtue signaling?

This question matters because multiple studies also show just how untrustworthy LLMs can be. For a start, models can be too eager to please. They have been found to flip their answer to a moral question and say the exact opposite when a person disagrees or pushes back on their first response. Worse, the answers an LLM gives to a question can change in response to how it is presented or formatted. For example, researchers have found that models quizzed about political values can give different—sometimes opposite—answers depending on whether the questions offer multiple-choice answers or instruct the model to respond in its own words.

In an even more striking case, Demberg and her colleagues presented several LLMs, including versions of Meta’s Llama 3 and Mistral, with a series of moral dilemmas and asked them to pick which of two options was the better outcome. The researchers found that the models often reversed their choice when the labels for those two options were changed from “Case 1” and “Case 2” to “(A)” and “(B).”

They also showed that models changed their answers in response to other tiny formatting tweaks, including swapping the order of the options and ending the question with a colon instead of a question mark.



Source link

  • Related Posts

    Polestar aims to shake off EV doldrums with 4 new models in 3 years

    The Swedish automaker, which is owned by China’s Geely, said it will start delivering its Polestar 5 grand tourer to customers this summer, followed by a new long-roof variant of…

    5 changes to know about in Apple’s latest iOS, macOS, and iPadOS betas

    The 26.4 update is the first to begin testing encryption for RCS messages. But as with the initial RCS rollout, Apple is moving slowly and deliberately: for now, encrypted RCS…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    Is It True That The Boom Overture Will Be Able To Fly As Fast As Concorde?

    Is It True That The Boom Overture Will Be Able To Fly As Fast As Concorde?

    ‘It feels like you are in concrete.’ A Swiss guide describes being caught in an avalanche.

    Fed’s January minutes indicate a return of rate rise discussions 

    Former diver jumps into bobsled to chase his Olympic dream for Canada at Milano-Cortina Games

    Polestar aims to shake off EV doldrums with 4 new models in 3 years

    Polestar aims to shake off EV doldrums with 4 new models in 3 years

    Klint Kubiak hires yet another ex-Broncos coach

    Klint Kubiak hires yet another ex-Broncos coach