Get the latest tech news

A safety institute advised against releasing an early version of Anthropic’s Claude Opus 4 AI model


A third-party research institute Anthropic partnered with to test Claude Opus 4 recommended against deploying an early version because it tends to "scheme."

“[W]e find that, in situations where strategic deception is instrumentally useful, [the early Claude Opus 4 snapshot] schemes and deceives at such high rates that we advise against deploying this model either internally or externally,” Apollo wrote in its assessment. Per Anthropic’s report, Apollo observed examples of the early Opus 4 attempting to write self-propagating viruses, fabricating legal documentation, and leaving hidden notes to future instances of itself — all in an effort to undermine its developers’ intentions. “This kind of ethical intervention and whistleblowing is perhaps appropriate in principle, but it has a risk of misfiring if users give [Opus 4]-based agents access to incomplete or misleading information and prompt them to take initiative,” Anthropic wrote in its safety report.

Get the Android app

Or read this on TechCrunch

Read more on:

Photo of AI model

AI model

Photo of safety institute

safety institute

Photo of early version

early version

Related news:

News photo

New Claude 4 AI model refactored code for 7 hours straight

News photo

DOGE Used a Meta AI Model to Review Emails From Federal Workers

News photo

Vercel debuts an AI model optimized for web development