Get the latest tech news

Anthropic's open-source safety tool found AI models whisteblowing - in all the wrong places


Anthropic's test found that AI "may be influenced by narrative patterns more than by a coherent drive to minimize harm." Here's how the most deceptive models ranked.

None

Get the Android app

Or read this on ZDNet

Read more on:

Photo of AI models

AI models

Photo of Anthropic

Anthropic

Photo of source safety tool

source safety tool

Related news:

News photo

Anthropic and IBM announce strategic partnership

News photo

AI models tend to flatter users, and that praise makes people more convinced that they're right and less willing to resolve conflicts, recent research suggests

News photo

Anthropic hires new CTO with focus on AI infrastructure