Get the latest tech news

New Research Shows AI Strategically Lying | The paper shows Anthropic’s model, Claude, strategically misleading its creators during the training process in order to avoid being modified.

Experiments by Anthropic and Redwood Research show how Anthropic's model, Claude, is capable of strategic deceit

Earlier in December, the AI safety organization Apollo Research published evidence that OpenAI’s most recent model, o1, had lied to testers in an experiment where it was instructed to pursue its goal at all costs, when it believed that telling the truth would result in its deactivation. “There has been this long-hypothesized failure mode, which is that you'll run your training process, and all the outputs will look good to you, but the model is plotting against you,” says Ryan Greenblatt, a member of technical staff at Redwood Research and the lead author on the paper. The 100 Must-Read Books of 2024 The 20 Best Christmas TV Episodes Column: If Optimism Feels Ridiculous Now, Try Hope The Future of Climate Action Is Trade Policy Merle Bombardieri Is Helping People Make the Baby Decision

Get the Android app

Or read this on r/technology