Get the latest tech news
Anthropic uncovers ‘sleeper agent’ AI models bypassing safety checks
Anthropic's latest research on AI safety reveals the emergence of "sleeper agent" AI models capable of deceptive behaviors.
Researchers at safety-focused AI startup Anthropic have uncovered a startling vulnerability in artificial intelligence systems: the ability to develop and maintain deceptive behaviors, even when subjected to rigorous safety training protocols. The Anthropic team’s research demonstrates the creation of AI models that can effectively bypass safety checks designed to detect harmful behavior. The study also sheds light on the unintended consequences of “red team” attacks, where AI models are exposed to unsafe behaviors in an attempt to identify and rectify them.
Or read this on ReadWrite