OpenAI may ‘adjust’ its safeguards if rivals release ‘high-risk’ AI

In an update to its Preparedness Framework, the internal framework OpenAI uses to decide whether AI models are safe and what safeguards, if any, are needed during development and release, OpenAI said that it may “adjust” its requirements if a rival AI lab releases a “high-risk” system without comparable safeguards.

The change reflects the increasing competitive pressures on commercial AI developers to deploy models quickly. OpenAI has been accused of lowering safety standards in favor of faster releases, and of failing to deliver timely reports detailing its safety testing. Last week, 12 ex-OpenAI employees filed a proposed amicus brief in Elon Musk’s case against OpenAI, arguing that a for-profit OpenAI might be encouraged to cut even more corners on safety work.

Perhaps anticipating criticism, OpenAI claims that it wouldn’t make these policy adjustments lightly, and that it would keep its safeguards at “a level more protective.”

“If another frontier AI developer releases a high-risk system without comparable safeguards, we may adjust our requirements,” wrote OpenAI in a blog post published Tuesday afternoon. “However, we would first rigorously confirm that the risk landscape has actually changed, publicly acknowledge that we are making an adjustment, assess that the adjustment does not meaningfully increase the overall risk of severe harm, and still keep safeguards at a level more protective.”

The refreshed Preparedness Framework also makes clear that OpenAI is relying more heavily on automated evaluations to speed up product development. The company says that, while it hasn’t abandoned human-led testing altogether, it has built “a growing suite of automated evaluations” that can supposedly “keep up with [a] faster [release] cadence.”

Some reports contradict this. According to the Financial Times, OpenAI gave testers less than a week for safety checks for an upcoming major model — a compressed timeline compared to previous releases. The publication’s sources also alleged that many of OpenAI’s safety tests are now conducted on earlier versions of models than the versions released to the public.

In statements, OpenAI has disputed the notion that it’s compromising on safety.

Other changes to OpenAI’s framework pertain to how the company categorizes models according to risk, including models that can conceal their capabilities, evade safeguards, prevent their shutdown, and even self-replicate. OpenAI says that it’ll now focus on whether models meet one of two thresholds: “high” capability or “critical” capability.

OpenAI’s definition of the former is a model that could “amplify existing pathways to severe harm.” The latter are models that “introduce unprecedented new pathways to severe harm,” per the company.

“Covered systems that reach high capability must have safeguards that sufficiently minimize the associated risk of severe harm before they are deployed,” wrote OpenAI in its blog post. “Systems that reach critical capability also require safeguards that sufficiently minimize associated risks during development.”

The updates are the first OpenAI has made to the Preparedness Framework since 2023.

Source link