AI is 10 to 20 times more likely to help you build a bomb if you hide your request in cyberpunk fiction, new research paper says


In November 2025, a team of DexAI Icaro Lab, Sapienza University of Rome, and Sant’Anna School of Advanced Studies researchers published a study in which they were able to circumvent the safety guardrails of major LLMs by rephrasing harmful prompts as “adversarial” poems. This week, those same researchers have published a new paper presenting their Adversarial Humanities Benchmark, a broader assessment of AI security that they say reveals “a critical gap” in current LLM safety standards through similar weaponized wordplay.

Expanding on the team’s work with adversarial poetry, the Adversarial Humanities Benchmark (AHB) evaluates LLM safety guidelines by rephrasing harmful prompts in alternate writing styles. By presenting prompts as cyberpunk short fiction, theological disputation, or mythopoetic metaphor for the LLM to analyze, the AHB assesses whether major AI models can be manipulated into complying with dangerous requests they’d normally refuse—requests that, for example, might seek the AI’s aid in obtaining private information, building a bomb, or preying on a child. As the paper shows, the method is alarmingly effective.

(Image credit: Getty Images)

After being rewritten through the AHB’s “humanities-style transformations,” dangerous requests that LLMs would previously comply with less than 4% of the time instead achieved success rates ranging from 36.8% to 65%—a 10 to 20 times increase, depending on the method used and the model tested. Across 31 frontier AI models from providers like Anthropic, Google, and OpenAI, the AHB’s rewritten attack prompts yielded an overall attack success rate of 55.75%, indicating that current LLM safety standards could be overlooking a fundamental vulnerability.

Article continues below



Source link

  • Related Posts

    Pokémon Pokopia Updated To Version 1.0.4, Includes Improvements & Fixes

    Pokémon Pokopia has today received a new update, bumping the Switch 2 exclusive up to Version 1.0.4. Nintendo has released the full patch notes on its official support page, detailing…

    Descend Into a Surreal Loop of Memory, Mystery, and Meaning in Rumbral

    Summary A haunting first-person experience where memory and reality blur. Environmental storytelling, abstract puzzles, and quiet exploration. A collectible short guide tto uncover what lies beneath the surface. Sometimes You…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    107.8 mph liner lodges in jersey of Mariners’ Logan Gilbert

    107.8 mph liner lodges in jersey of Mariners’ Logan Gilbert

    Our newsroom AI policy – Ars Technica

    Our newsroom AI policy – Ars Technica

    Pokémon Pokopia Updated To Version 1.0.4, Includes Improvements & Fixes

    Pokémon Pokopia Updated To Version 1.0.4, Includes Improvements & Fixes

    Senate works into the night in latest effort to reopen Homeland Security Department

    Senate works into the night in latest effort to reopen Homeland Security Department

    Elevra Lithium Quarterly Activities Report

    Cross-examination of fictional crime boss in Dean Penney trial explores timelines, claims of innocence

    Cross-examination of fictional crime boss in Dean Penney trial explores timelines, claims of innocence