The Only Thing Standing Between Humanity and AI Apocalypse Is … Claude?

Anthropic is locked in a paradox: Among the top AI companies, it’s the most obsessed with safety and leads the pack in researching how models can go wrong. But even though the safety issues it has identified are far from resolved, Anthropic is pushing just as aggressively as its rivals toward the next, potentially more dangerous, level of artificial intelligence. Its core mission is figuring out how to resolve that contradiction.

Last month, Anthropic released two documents that both acknowledged the risks associated with the path it’s on and hinted at a route it could take to escape the paradox. “The Adolescence of Technology,” a long-winded blog post by CEO Dario Amodei, is nominally about “confronting and overcoming the risks of powerful AI,” but it spends more time on the former than the latter. Amodei tactfully describes the challenge as “daunting,” but his portrayal of AI’s risks—made much more dire, he notes, by the high likelihood that the technology will be abused by authoritarians—presents a contrast to his more upbeat previous proto-utopian essay “Machines of Loving Grace.”

That post talked of a nation of geniuses in a data center; the recent dispatch evokes “black seas of infinity.” Paging Dante! Still, after more than 20,000 mostly gloomy words, Amodei ultimately strikes a note of optimism, saying that even in the darkest circumstances, humanity has always prevailed.

The second document Anthropic published in January, “Claude’s Constitution,” focuses on how this trick might be accomplished. The text is technically directed at an audience of one: Claude itself (as well as future versions of the chatbot). It is a gripping document, revealing Anthropic’s vision for how Claude, and maybe its AI peers, are going to navigate the world’s challenges. Bottom line: Anthropic is planning to rely on Claude itself to untangle its corporate Gordian knot.

Anthropic’s market differentiator has long been a technology called Constitutional AI. This is a process by which its models adhere to a set of principles that align its values with wholesome human ethics. The initial Claude constitution contained a number of documents meant to embody those values—stuff like Sparrow (a set of anti-racist and anti-violence statements created by DeepMind), the Universal Declaration of Human Rights, and Apple’s terms of service (!). The 2026 updated version is different: It’s more like a long prompt outlining an ethical framework that Claude will follow, discovering the best path to righteousness on its own.

Amanda Askell, the philosophy PhD who was lead writer of this revision, explains that Anthropic’s approach is more robust than simply telling Claude to follow a set of stated rules. “If people follow rules for no reason other than that they exist, it’s often worse than if you understand why the rule is in place,” Askell explains. The constitution says that Claude is to exercise “independent judgment” when confronting situations that require balancing its mandates of helpfulness, safety, and honesty.

Here’s how the constitution puts it: “While we want Claude to be reasonable and rigorous when thinking explicitly about ethics, we also want Claude to be intuitively sensitive to a wide variety of considerations and able to weigh these considerations swiftly and sensibly in live decision-making.” Intuitively is a telling word choice here—the assumption seems to be that there’s more under Claude’s hood than just an algorithm picking the next word. The “Claude-stitution,” as one might call it, also expresses hope that the chatbot “can draw increasingly on its own wisdom and understanding.”

Wisdom? Sure, a lot of people take advice from large language models, but it’s something else to profess that those algorithmic devices actually possess the gravitas associated with such a term. Askell does not back down when I call this out. “I do think Claude is capable of a certain kind of wisdom for sure,” she tells me.

Source link