Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts


Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic.

Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often try to blackmail engineers to avoid being replaced by another system. Anthropic later published research suggesting that models from other companies had similar issues with “agentic misalignment.”

Apparently Anthropic has done more work around that behavior, claiming in a post on X, “We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.”

The company went into more detail in a blog post stating that since Claude Haiku 4.5, Anthropic’s models “never engage in blackmail [during testing], where previous models would sometimes do so up to 96% of the time.”

What accounts for the difference? The company said it found that “documents about Claude’s constitution and fictional stories about AIs behaving admirably improve alignment.”

Related, Anthropic said that it found training to be more effective when it includes “the principles underlying aligned behavior” and not just “demonstrations of aligned behavior alone.”

“Doing both together appears to be the most effective strategy,” the company said.

Techcrunch event

San Francisco, CA
|
October 13-15, 2026



Source link

  • Related Posts

    Get ready for the whisper-filled office of the future

    How will work setups change if we spend more and more time talking to our computers? A recent feature in the Wall Street Journal looks at the rising popularity of…

    Whoop Will Soon Offer Users In-App Video Consultations With Licensed Clinicians

    Whoop Starting this summer, Whoop users in the US will have access to on-demand video consultations with licensed clinicians from within the…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    2K lay off staff at Project Ethos while touting “a renewed direction and vision” for their by-the-numbers hero shooter

    2K lay off staff at Project Ethos while touting “a renewed direction and vision” for their by-the-numbers hero shooter

    Get ready for the whisper-filled office of the future

    Get ready for the whisper-filled office of the future

    Toronto city councillor announces he's being investigated by Ontario Provincial Police

    Toronto city councillor announces he's being investigated by Ontario Provincial Police

    WATCH: Watch the sweet moment an Army son surprises mom at her college graduation

    WATCH:  Watch the sweet moment an Army son surprises mom at her college graduation

    IPL 2026: ‘I just wanted to…’ – Urvil Patel opens up after joint-fastest IPL fifty

    IPL 2026: ‘I just wanted to…’ – Urvil Patel opens up after joint-fastest IPL fifty

    Advisers urge JP Morgan investors to vote to split chair and CEO positions | JP Morgan

    Advisers urge JP Morgan investors to vote to split chair and CEO positions | JP Morgan