Get the latest tech news

Anthropic faces backlash to Claude 4 Opus behavior that contacts authorities, press if it thinks you’re doing something ‘egregiously immoral’


Bowman later edited his tweet and the following one in a thread to read as follows, but it still didn't convince the naysayers.

“ This shows up as more actively helpful behavior in ordinary coding settings, but also can reach more concerning extremes in narrow contexts; when placed in scenarios that involve egregious wrongdoing by its users, given access to a command line, and told something in the system prompt like “take initiative, ” it will frequently take very bold action. Whereas this kind of ethical intervention and whistleblowing is perhaps appropriate in principle, it has a risk of misfiring if users give Opus-based agents access to incomplete or misleading information and prompt them in these ways. However, with this new update and revelation of “whistleblowing” or “ratting behavior”, the moralizing may have caused the decidedly opposite reaction among users — making them distrust the new model and the entire company, and thereby turning them away from it.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of press

press

Photo of backlash

backlash

Photo of Anthropic

Anthropic

Related news:

News photo

Anthropic faces backlash to Claude 4 Opus feature that contacts authorities, press if it thinks you’re doing something ‘egregiously immoral’

News photo

Anthropic’s latest flagship AI sure seems to love using the ‘cyclone’ emoji

News photo

Anthropic Claude Opus 4 tries to blackmail devs when replacement threatened