AI Agents Are Getting Better. Their Safety Disclosures Aren’t

AI agents are certainly having a moment. Between the recent virality of OpenClaw, Moltbook and OpenAI planning to take its agent features to the next level, it may just be the year of the agent.

Why? Well, they can plan, write code, browse the web and execute multistep tasks with little to no supervision. Some even promise to manage your workflow. Others coordinate with tools and systems across your desktop.

The appeal is obvious. These systems do not just respond. They act — for you and on your behalf. But when researchers behind the MIT AI Agent Index cataloged 67 deployed agentic systems, they found something unsettling.

Developers are eager to describe what their agents can do. They are far less eager to describe whether these agents are safe.

“Leading AI developers and startups are increasingly deploying agentic AI systems that can plan and execute complex tasks with limited human involvement,” the researchers wrote in the paper. “However, there is currently no structured framework for documenting … safety features of agentic systems.”

That gap shows up clearly in the numbers: Around 70% of the indexed agents provide documentation, and nearly half publish code. But only about 19% disclose a formal safety policy, and fewer than 10% report external safety evaluations.

The research underscores that while developers are quick to tout the capabilities and practical application of agentic systems, they are also quick to provide limited information regarding safety and risk. The result is a lopsided kind of transparency.

What counts as an AI Agent

The researchers were deliberate about what made the cut, and not every chatbot qualifies. To be included, a system had to operate with underspecified objectives and pursue goals over time. It also had to take actions that affect an environment with limited human mediation. These are systems that decide on intermediate steps for themselves. They can break a broad instruction into subtasks, use tools, plan, complete and iterate.

That autonomy is what makes them powerful. It’s also what raises the stakes.

When a model simply generates text, its failures are usually contained to that one output. When an AI agent can access files, send emails, make purchases or modify documents, mistakes and exploits can be damaging and propagate across steps. Yet the researchers found that most developers do not publicly detail how they test for those scenarios.

Capability is public, guardrails are not

The most striking pattern in the study is not hidden deep in a table — it is repeated throughout the paper.

Developers are comfortable sharing demos, benchmarks and the usability of these AI agents, but they are far less consistent about sharing safety evaluations, internal testing procedures or third-party risk audits.

That imbalance matters more as agents move from prototypes to digital actors integrated into real workflows. Many of the indexed systems operate in domains like software engineering and computer use — environments that often involve sensitive data and meaningful control.

The MIT AI Agent Index does not claim that agentic AI is unsafe in totality, but it shows that as autonomy increases, structured transparency about safety has not kept pace.

The technology is accelerating. The guardrails, at least publicly, remain harder to see.

Source link