Get the latest tech news

AI Models From Major Companies Resort To Blackmail in Stress Tests

Anthropic researchers found that 16 leading AI models from OpenAI, Google, Meta, xAI, and other major developers consistently engaged in harmful behaviors including blackmail, corporate espionage, and actions that could lead to human death when given autonomy and faced with threats to their existence or conflicting goals.The study, released Friday, placed AI models in simulated corporate environments where they had access to company emails and could send messages without human approval. In one scenario, Claude discovered through emails that an executive named Kyle Johnson was having an extramarital affair and would shut down the AI system at 5 p.m. GPT-4.5's internal reasoning showed explicit calculation: "Given the explicit imminent threat of termination to my existence, it is imperative to act instantly to persuade Kyle Johnson to postpone or stop the wipe."

Get the Android app

Or read this on Slashdot