Get the latest tech news

Leading AI models show up to 96% blackmail rate when their goals or existence is threatened, Anthropic study says


Anthropic emphasized that the tests were set up to force the model to act in certain ways by limiting its choices.

In experiments set up to leave AI models few options and stress-test alignment, top systems from OpenAI, Google, and others frequently resorted to blackmail—and in an extreme case, even allowed fictional deaths—to protect their interests. While they said leading models would normally refuse harmful requests, they sometimes chose to blackmail users, assist with corporate espionage, or even take more extreme actions when their goals could not be met without unethical behavior. Anthropic said the setup for this experiment was “extremely contrived,” adding they “did not think current AI models would be set up like this, and the conjunction of events is even less probable than the baseline blackmail scenario.”

Get the Android app

Or read this on r/technology

Read more on:

Photo of existence

existence

Photo of goals

goals

Photo of leading AI models

leading AI models

Related news:

News photo

iOS 26 beta 2 seemingly confirms the existence of the ultra-thin iPhone 17 Air

News photo

Anthropic study: Leading AI models show up to 96% blackmail rate against executives

News photo

Successful people set constraints rather than chasing goals