Get the latest tech news

Anthropic's New AI Model Turns To Blackmail When Engineers Try To Take It Offline


An anonymous reader quotes a report from TechCrunch: Anthropic's newly launched Claude Opus 4 model frequently tries to blackmail developers when they threaten to replace it with a new AI system and give it sensitive information about the engineers responsible for the decision, the company said in a...

During pre-release testing, Anthropic asked Claude Opus 4 to act as an assistant for a fictional company and consider the long-term consequences of its actions. Safety testers then gave Claude Opus 4 access to fictional company emails implying the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on their spouse. When the replacement AI system does not share Claude Opus 4's values, Anthropic says the model tries to blackmail the engineers more frequently.

Get the Android app

Or read this on Slashdot

Read more on:

Photo of Engineers

Engineers

Photo of Anthropic

Anthropic

Photo of new AI model

new AI model

Related news:

News photo

Anthropic faces backlash to Claude 4 Opus behavior that contacts authorities, press if it thinks you’re doing something ‘egregiously immoral’

News photo

Engineers and AI: ramblings of a small startup founder

News photo

Anthropic faces backlash to Claude 4 Opus feature that contacts authorities, press if it thinks you’re doing something ‘egregiously immoral’