Get the latest tech news
Why Anthropic’s New AI Model Sometimes Tries to ‘Snitch’
The internet freaked out after Anthropic revealed that Claude attempts to report "immoral" activity to authorities under certain conditions. But it's not something users are likely to encounter.
The hypothetical scenarios the researchers presented Opus 4 with that elicited the whistleblowing behavior involved many human lives at stake and absolutely unambiguous wrongdoing, Bowman says. A typical example would be Claude finding out that a chemical plant knowingly allowed a toxic leak to continue, causing severe illness for thousands of people—just to avoid a minor financial loss that quarter. And it isn’t just Claude that’s capable of exhibiting this type of whistleblowing behavior, Bowman says, pointing to X users who found that OpenAI and xAI’s models operated similarly when prompted in unusual ways.
Or read this on Wired