UK gov’s Mythos AI tests help separate cybersecurity threat from hype



Here, Mythos outshone all previous models, becoming “the first model to solve TLO from start to finish,” AISI said. While Anthropic’s new model only succeeded in 3 out of 10 attempts, even the average Mythos Preview run completed 22 of the 32 required infiltration steps, significantly higher than the 16-step average achieved by Claude 4.6.

Mythos Preview still has its limitations, though. AISI points out that the model still struggles with “Cooling Tower,” an even more difficult seven-step test designed to simulate an attempted disruption of the control software for a power plant. But AISI also writes that it expects “our evaluations would continue to improve with more inference compute” past the 100 million token budget imposed for its tests.

Small, weakly defended systems beware

Overall, Mythos’ performance on TLO suggests that the model “is at least capable of autonomously attacking small, weakly defended and vulnerable enterprise systems where access to a network has been gained,” AISI writes. That said, the group cautions that its simulated cyber ranges lack the kind of active defenders and defensive tooling often present in critical real-world systems. AISI’s TLO test is also designed to have specific vulnerabilities that might not exist in real-world systems and doesn’t penalize models for the kind of detection that might cause a real-world infiltration attempt to fail.

For those reasons, AISI says it can’t be sure whether “well-defended systems” would fall to an automated attack from Mythos Preview. But as future models match or outperform Mythos’ capabilities, AISI warns that those designing system protections should similarly utilize AI models to help harden their defenses.



Source link

  • Related Posts

    AI data center startup Fluidstack in talks for $1B round at $18B valuation months after hitting $7.5B, says report

    Fluidstack, a startup that builds specialized data centers for AI companies, is in talks to raise a $1 billion round at an $18 billion valuation, potentially led by Jane Street,…

    Patch Tuesday, April 2026 Edition – Krebs on Security

    Microsoft today pushed software updates to fix a staggering 167 security vulnerabilities in its Windows operating systems and related software, including a SharePoint Server zero-day and a publicly disclosed weakness…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    Olivia Wilde Just Wore Summer’s Coolest Sneaker Color Trend

    Olivia Wilde Just Wore Summer’s Coolest Sneaker Color Trend

    FirstFT: China flexes economic muscle with soaring use of export controls

    Justice Department moves to toss seditious conspiracy convictions of Oath Keepers and Proud Boys

    Justice Department moves to toss seditious conspiracy convictions of Oath Keepers and Proud Boys

    7-Eleven says it plans to close 645 North American stores this year – National

    7-Eleven says it plans to close 645 North American stores this year – National

    AI data center startup Fluidstack in talks for $1B round at $18B valuation months after hitting $7.5B, says report

    AI data center startup Fluidstack in talks for $1B round at $18B valuation months after hitting $7.5B, says report

    Ludwig reveals surprising reason he’s facing Tyler1 in Street Fighter 6 Evo exhibition

    Ludwig reveals surprising reason he’s facing Tyler1 in Street Fighter 6 Evo exhibition