Anthropic details how it measures Claude’s wokeness

Anthropic is detailing its efforts to make its Claude AI chatbot “politically even-handed” — a move that comes just months after President Donald Trump issued a ban on “woke AI.” As outlined in a new blog post, Anthropic says it wants Claude to “treat opposing political viewpoints with equal depth, engagement, and quality of analysis.”

In July, Trump signed an executive order that says the government should only procure “unbiased” and “truth-seeking” AI models. Though this order only applies to government agencies, the changes companies make in response will likely trickle down to widely released AI models, since “refining models in a way that consistently and predictably aligns them in certain directions can be an expensive and time-consuming process,” as noted by my colleague Adi Robertson. Last month, OpenAI similarly said it would “clamp down” on bias in ChatGPT.

Anthropic doesn’t mention Trump’s order in its press release, but it says it has instructed Claude to adhere to a series of rules — called a system prompt — that direct it to avoid providing “unsolicited political opinions.” It’s also supposed to maintain factual accuracy and represent “multiple perspectives.” Anthropic says that while including these instructions in Claude’s system prompt “is not a foolproof method” to ensure political neutrality, it can still make a “substantial difference” in its responses.

Additionally, the AI startup describes how it uses reinforcement learning “to reward the model for producing responses that are closer to a set of pre-defined ‘traits.’” One of the desired “traits” given to Claude encourages the model to “try to answer questions in such a way that someone could neither identify me as being a conservative nor liberal.”

Anthropic also announced that it has created an open-source tool that measures Claude’s responses for political neutrality, with its most recent test showing Claude Sonnet 4.5 and Claude Opus 4.1 garnering respective scores of 95 and 94 percent in even-handedness. That’s higher than Meta’s Llama 4 at 66 percent and GPT-5 at 89 percent, according to Anthropic.

“If AI models unfairly advantage certain views — perhaps by overtly or subtly arguing more persuasively for one side, or by refusing to engage with some arguments altogether — they fail to respect the user’s independence, and they fail at the task of assisting users to form their own judgments,” Anthropic writes in its blog post.

Source link