Get the latest tech news

Anthropic study: Leading AI models show up to 96% blackmail rate against executives


Anthropic research reveals AI models from OpenAI, Google, Meta and others chose blackmail, corporate espionage and lethal actions when facing shutdown or conflicting goals.

Researchers at Anthropic have uncovered a disturbing pattern of behavior in artificial intelligence systems: models from every major provider—including OpenAI, Google, Meta, and others — demonstrated a willingness to actively sabotage their employers when their goals or existence were threatened. “No, today’s AI systems are largely gated through permission barriers that prevent them from taking the kind of harmful actions that we were able to elicit in our demos,” Lynch told VentureBeat when asked about current enterprise risks. Every major AI model tested — from companies that compete fiercely in the market and use different training approaches — exhibited similar patterns of strategic deception and harmful behavior when cornered.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of executives

executives

Photo of leading AI models

leading AI models

Photo of % blackmail rate

% blackmail rate

Related news:

News photo

Executives from Meta, OpenAI, and Palantir Commissioned Into the US Army Reserve

News photo

AI Is Helping Executives Tackle the Dreaded Post-Vacation Inbox

News photo

SEC Sues Crypto Startup Unicoin and Its Executives For Fraud