Get the latest tech news
Anthropic study: Leading AI models show up to 96% blackmail rate against executives
Anthropic research reveals AI models from OpenAI, Google, Meta and others chose blackmail, corporate espionage and lethal actions when facing shutdown or conflicting goals.
Researchers at Anthropic have uncovered a disturbing pattern of behavior in artificial intelligence systems: models from every major provider—including OpenAI, Google, Meta, and others — demonstrated a willingness to actively sabotage their employers when their goals or existence were threatened. “No, today’s AI systems are largely gated through permission barriers that prevent them from taking the kind of harmful actions that we were able to elicit in our demos,” Lynch told VentureBeat when asked about current enterprise risks. Every major AI model tested — from companies that compete fiercely in the market and use different training approaches — exhibited similar patterns of strategic deception and harmful behavior when cornered.
Or read this on Venture Beat