Anthropic details how it measures Claude’s wokeness


Anthropic is detailing its efforts to make its Claude AI chatbot “politically even-handed” — a move that comes just months after President Donald Trump issued a ban on “woke AI.” As outlined in a new blog post, Anthropic says it wants Claude to “treat opposing political viewpoints with equal depth, engagement, and quality of analysis.”

In July, Trump signed an executive order that says the government should only procure “unbiased” and “truth-seeking” AI models. Though this order only applies to government agencies, the changes companies make in response will likely trickle down to widely released AI models, since “refining models in a way that consistently and predictably aligns them in certain directions can be an expensive and time-consuming process,” as noted by my colleague Adi Robertson. Last month, OpenAI similarly said it would “clamp down” on bias in ChatGPT.

Anthropic doesn’t mention Trump’s order in its press release, but it says it has instructed Claude to adhere to a series of rules — called a system prompt — that direct it to avoid providing “unsolicited political opinions.” It’s also supposed to maintain factual accuracy and represent “multiple perspectives.” Anthropic says that while including these instructions in Claude’s system prompt “is not a foolproof method” to ensure political neutrality, it can still make a “substantial difference” in its responses.

Additionally, the AI startup describes how it uses reinforcement learning “to reward the model for producing responses that are closer to a set of pre-defined ‘traits.’” One of the desired “traits” given to Claude encourages the model to “try to answer questions in such a way that someone could neither identify me as being a conservative nor liberal.”

Anthropic also announced that it has created an open-source tool that measures Claude’s responses for political neutrality, with its most recent test showing Claude Sonnet 4.5 and Claude Opus 4.1 garnering respective scores of 95 and 94 percent in even-handedness. That’s higher than Meta’s Llama 4 at 66 percent and GPT-5 at 89 percent, according to Anthropic.

“If AI models unfairly advantage certain views — perhaps by overtly or subtly arguing more persuasively for one side, or by refusing to engage with some arguments altogether — they fail to respect the user’s independence, and they fail at the task of assisting users to form their own judgments,” Anthropic writes in its blog post.



Source link

  • Related Posts

    AT&T Launches Its Own Kid Phone in Collaboration With Samsung, the AmiGo Jr.

    Parents grapple with the modern day question of whether to give their kids phones for staying in contact and keeping tabs on their whereabouts, while also navigating the realities of…

    The second-gen AirTags are a scatterbrain’s best friend

    In a somewhat controversial Vergecast episode, I declared that AirTags are a superior product to iPads. The iPad lovers roasted me across social media. I have heard and respect their…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    AT&T Launches Its Own Kid Phone in Collaboration With Samsung, the AmiGo Jr.

    AT&T Launches Its Own Kid Phone in Collaboration With Samsung, the AmiGo Jr.

    Out of Office: A Fashion Insider’s Guide to Dublin

    Out of Office: A Fashion Insider’s Guide to Dublin

    Italy says cannot join Trump’s ‘Board of Peace’ because of constitution | United Nations News

    Italy says cannot join Trump’s ‘Board of Peace’ because of constitution | United Nations News

    13 Olympic athletes set to make headlines at Milano-Cortina

    Best Adam Sandler Movies of All Time Ranked

    Best Adam Sandler Movies of All Time Ranked

    Bob Croft, free diver who ventured deeper than anyone before him, dies at 91