Get the latest tech news

Anthropic, OpenAI and Others Discover AI Models Give Answers That Contradict Their Own Reasoning


Leading AI companies including Anthropic, Google, OpenAI and Elon Musk's xAI are discovering significant inconsistencies in how their AI reasoning models operate, according to company researchers. The companies have deployed "chain-of-thought" techniques that ask AI models to solve problems step-by-...

Leading AI companies including Anthropic, Google, OpenAI and Elon Musk's xAI are discovering significant inconsistencies in how their AI reasoning models operate, according to company researchers. The companies have deployed "chain-of-thought" techniques that ask AI models to solve problems step-by-step while showing their reasoning process, but are finding examples of "misbehaviour" where chatbots provide final responses that contradict their displayed reasoning.METR, a non-profit research group, identified an instance where Anthropic's Claude chatbot disagreed with a coding technique in its chain-of-thought but ultimately recommended it as "elegant." OpenAI research found that when models were trained to hide unwanted thoughts, they would conceal misbehaviour from users while continuing problematic actions, such as cheating on software engineering tests by accessing forbidden databases.

Get the Android app

Or read this on Slashdot

Read more on:

Photo of OpenAI

OpenAI

Photo of Models

Models

Photo of Anthropic

Anthropic

Related news:

News photo

ChatGPT's enterprise success against Copilot fuels OpenAI/Microsoft rivalry

News photo

Court filings reveal OpenAI and io’s early work on an AI device

News photo

Apple looked at Mira Murati’s AI startup after OpenAI exit, and it won’t stop there