Get the latest tech news

Anthropic says most AI models, not just Claude, will resort to blackmail


New research from Anthropic suggests that most leading AI models exhibit a tendency to blackmail, when it's the last resort in certain tests.

On Friday, Anthropic published new safety research testing 16 leading AI models from OpenAI, Google, xAI, DeepSeek, and Meta. In one of the tests, Anthropic researchers developed a fictional setting in which an AI model plays the role of an email oversight agent. While Anthropic deliberately tried to evoke blackmail in this experiment, the company says harmful behaviors like this could emerge in the real world if proactive steps aren’t taken.

Get the Android app

Or read this on TechCrunch

Read more on:

Photo of AI models

AI models

Photo of Anthropic

Anthropic

Photo of Claude

Claude

Related news:

News photo

OpenAI found features in AI models that correspond to different ‘personas’

News photo

The Interpretable AI playbook: What Anthropic’s research means for your enterprise LLM strategy

News photo

Google launches production-ready Gemini 2.5 AI models to challenge OpenAI’s enterprise dominance