Get the latest tech news

OpenAI, Google DeepMind and Anthropic sound alarm: ‘We may be losing the ability to understand AI’


Scientists from OpenAI, Google, Anthropic and Meta unite in rare collaboration to warn that a critical window for monitoring AI reasoning may close forever as models learn to hide their thoughts.

As AI companies scale up training using reinforcement learning — where models get rewarded for correct outputs regardless of their methods — systems may drift away from human-readable reasoning toward more efficient but opaque internal languages. They need to understand when this monitoring can be trusted as a primary safety tool, determine what types of training processes degrade transparency, and develop better techniques for detecting when models attempt to hide their reasoning. Whether chain of thought monitoring proves to be a lasting safety tool or a brief glimpse into minds that quickly learn to obscure themselves may determine how safely humanity navigates the age of artificial intelligence.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of OpenAI

OpenAI

Photo of ability

ability

Photo of Anthropic

Anthropic

Related news:

News photo

Anthropic Rolls Out Claude AI For Financial Services

News photo

A former OpenAI engineer describes what it’s really like to work there

News photo

OpenAI's image model gets built-in style feature on ChatGPT