Get the latest tech news

New Research Shows AI Strategically Lying | The paper shows Anthropic’s model, Claude, strategically misleading its creators during the training process in order to avoid being modified.


Experiments by Anthropic and Redwood Research show how Anthropic's model, Claude, is capable of strategic deceit

Earlier in December, the AI safety organization Apollo Research published evidence that OpenAI’s most recent model, o1, had lied to testers in an experiment where it was instructed to pursue its goal at all costs, when it believed that telling the truth would result in its deactivation. “There has been this long-hypothesized failure mode, which is that you'll run your training process, and all the outputs will look good to you, but the model is plotting against you,” says Ryan Greenblatt, a member of technical staff at Redwood Research and the lead author on the paper. The 100 Must-Read Books of 2024 The 20 Best Christmas TV Episodes Column: If Optimism Feels Ridiculous Now, Try Hope The Future of Climate Action Is Trade Policy Merle Bombardieri Is Helping People Make the Baby Decision

Get the Android app

Or read this on r/technology

Read more on:

Photo of Order

Order

Photo of creators

creators

Photo of Paper

Paper

Related news:

News photo

Menlo Ventures and Anthropic have picked the first 18 startups for their $100M fund

News photo

YouTube to test a way for creators and celebrities to find AI-generated content using their likeness

News photo

YouTube says that soon, its tech will be able to find AI copies of celebs and creators