Get the latest tech news

Anthropic's open-source safety tool found AI models whisteblowing - in all the wrong places

Anthropic's test found that AI "may be influenced by narrative patterns more than by a coherent drive to minimize harm." Here's how the most deceptive models ranked.

None

Get the Android app

Or read this on ZDNet

Related news:

Anthropic and IBM announce strategic partnership

AI models tend to flatter users, and that praise makes people more convinced that they're right and less willing to resolve conflicts, recent research suggests

Anthropic hires new CTO with focus on AI infrastructure

« One of my favorite Dell laptops of the year is $850 off right now

The publishing industry has a gambling problem »