AI Agents Are Getting Better. Their Safety Disclosures Aren’t


AI agents are certainly having a moment. Between the recent virality of OpenClaw, Moltbook and OpenAI planning to take its agent features to the next level, it may just be the year of the agent.

Why? Well, they can plan, write code, browse the web and execute multistep tasks with little to no supervision. Some even promise to manage your workflow. Others coordinate with tools and systems across your desktop. 

The appeal is obvious. These systems do not just respond. They act — for you and on your behalf. But when researchers behind the MIT AI Agent Index cataloged 67 deployed agentic systems, they found something unsettling.

Developers are eager to describe what their agents can do. They are far less eager to describe whether these agents are safe.

“Leading AI developers and startups are increasingly deploying agentic AI systems that can plan and execute complex tasks with limited human involvement,” the researchers wrote in the paper. “However, there is currently no structured framework for documenting … safety features of agentic systems.”

That gap shows up clearly in the numbers: Around 70% of the indexed agents provide documentation, and nearly half publish code. But only about 19% disclose a formal safety policy, and fewer than 10% report external safety evaluations. 

The research underscores that while developers are quick to tout the capabilities and practical application of agentic systems, they are also quick to provide limited information regarding safety and risk. The result is a lopsided kind of transparency. 

What counts as an AI Agent

The researchers were deliberate about what made the cut, and not every chatbot qualifies. To be included, a system had to operate with underspecified objectives and pursue goals over time. It also had to take actions that affect an environment with limited human mediation. These are systems that decide on intermediate steps for themselves. They can break a broad instruction into subtasks, use tools, plan, complete and iterate. 

AI Atlas

That autonomy is what makes them powerful. It’s also what raises the stakes.

When a model simply generates text, its failures are usually contained to that one output. When an AI agent can access files, send emails, make purchases or modify documents, mistakes and exploits can be damaging and propagate across steps. Yet the researchers found that most developers do not publicly detail how they test for those scenarios.

Capability is public, guardrails are not

The most striking pattern in the study is not hidden deep in a table — it is repeated throughout the paper.

Developers are comfortable sharing demos, benchmarks and the usability of these AI agents, but they are far less consistent about sharing safety evaluations, internal testing procedures or third-party risk audits.

That imbalance matters more as agents move from prototypes to digital actors integrated into real workflows. Many of the indexed systems operate in domains like software engineering and computer use — environments that often involve sensitive data and meaningful control.

The MIT AI Agent Index does not claim that agentic AI is unsafe in totality, but it shows that as autonomy increases, structured transparency about safety has not kept pace.

The technology is accelerating. The guardrails, at least publicly, remain harder to see.





Source link

  • Related Posts

    A $10K Bounty Awaits Anyone Who Can Hack Ring Cameras to Stop Sharing Data With Amazon

    Usually, when you see a feel-good story about finding a lost dog, you don’t immediately react with fear and revulsion. But that was indeed the case in response to a…

    Google’s new Gemini Pro model has record benchmark scores — again

    On Thursday, Google released the newest version of Gemini Pro, its powerful LLM. The model, 3.1, is currently available as a preview and will be generally released soon, the company…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    Trump defends tariffs in pre-midterms appearance in battleground Georgia | Donald Trump

    Trump defends tariffs in pre-midterms appearance in battleground Georgia | Donald Trump

    Full text and video: Smith announces Alberta referendum on immigration

    Xenoblade fans, it’s happening! Nintendo drops a surprise Switch 2 upgrade for Xenoblade Chronicles X with 60fps and more

    Xenoblade fans, it’s happening! Nintendo drops a surprise Switch 2 upgrade for Xenoblade Chronicles X with 60fps and more

    ‘Increased control over immigration?’ Nine questions on Alberta referendum in October

    ‘Increased control over immigration?’ Nine questions on Alberta referendum in October

    With policy in a good place, Fed is probing AI’s economic impact, Daly says

    With policy in a good place, Fed is probing AI’s economic impact, Daly says

    Danielle Smith promises Alberta referendum over immigration, Constitution changes

    Danielle Smith promises Alberta referendum over immigration, Constitution changes