Get the latest tech news

Anthropic says most AI models, not just Claude, will resort to blackmail

New research from Anthropic suggests that most leading AI models exhibit a tendency to blackmail, when it's the last resort in certain tests.

On Friday, Anthropic published new safety research testing 16 leading AI models from OpenAI, Google, xAI, DeepSeek, and Meta. In one of the tests, Anthropic researchers developed a fictional setting in which an AI model plays the role of an email oversight agent. While Anthropic deliberately tried to evoke blackmail in this experiment, the company says harmful behaviors like this could emerge in the real world if proactive steps aren’t taken.

Get the Android app

Or read this on TechCrunch