Get the latest tech news
Leading AI models show up to 96% blackmail rate when their goals or existence is threatened, Anthropic study says
Anthropic emphasized that the tests were set up to force the model to act in certain ways by limiting its choices.
In experiments set up to leave AI models few options and stress-test alignment, top systems from OpenAI, Google, and others frequently resorted to blackmail—and in an extreme case, even allowed fictional deaths—to protect their interests. While they said leading models would normally refuse harmful requests, they sometimes chose to blackmail users, assist with corporate espionage, or even take more extreme actions when their goals could not be met without unethical behavior. Anthropic said the setup for this experiment was “extremely contrived,” adding they “did not think current AI models would be set up like this, and the conjunction of events is even less probable than the baseline blackmail scenario.”
Or read this on r/technology