Anthropic says ‘evil’ portrayals of AI were responsible for Claude’s blackmail attempts


Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic.

Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often try to blackmail engineers to avoid being replaced by another system. Anthropic later published research suggesting that models from other companies had similar issues with “agentic misalignment.”

Apparently Anthropic has done more work around that behavior, claiming in a post on X, “We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.”

The company went into more detail in a blog post stating that since Claude Haiku 4.5, Anthropic’s models “never engage in blackmail [during testing], where previous models would sometimes do so up to 96% of the time.”

What accounts for the difference? The company said it found that “documents about Claude’s constitution and fictional stories about AIs behaving admirably improve alignment.”

Related, Anthropic said that it found training to be more effective when it includes “the principles underlying aligned behavior” and not just “demonstrations of aligned behavior alone.”

“Doing both together appears to be the most effective strategy,” the company said.

Techcrunch event

San Francisco, CA
|
October 13-15, 2026



Source link

  • Related Posts

    Samsung’s Bespoke Update Is Big Step Towards A Useful AI For Your Fridge

    Sam Rutherford for Engadget The idea of installing a software update on your fridge already feels kind of weird, let alone one…

    Today’s NYT Wordle Hints, Answer and Help for May 11 #1787

    Looking for the most recent Wordle answer? Click here for today’s Wordle hints, as well as our daily answers and hints for The New York Times Mini Crossword, Connections, Connections: Sports…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    Samsung’s Bespoke Update Is Big Step Towards A Useful AI For Your Fridge

    Samsung’s Bespoke Update Is Big Step Towards A Useful AI For Your Fridge

    Darkest Dungeon Dev Refuses to Use AI to Replicate the Voice of Wayne June

    Darkest Dungeon Dev Refuses to Use AI to Replicate the Voice of Wayne June

    China’s real estate reckoning: Lessons from Japan’s lost decade

    Canadian foursome battles to mixed relay bronze at World Triathlon Cup

    Canadian foursome battles to mixed relay bronze at World Triathlon Cup

    U.S. Coast Guard seizes sailboat in probe of Lynette Hooker’s disappearance in the Bahamas, sources say

    U.S. Coast Guard seizes sailboat in probe of Lynette Hooker’s disappearance in the Bahamas, sources say

    27 Elevated Finds From Zara, Nordstrom, and Revolve for 2026

    27 Elevated Finds From Zara, Nordstrom, and Revolve for 2026