Get the latest tech news
OpenAI, Google DeepMind and Anthropic sound alarm: ‘We may be losing the ability to understand AI’
Scientists from OpenAI, Google, Anthropic and Meta unite in rare collaboration to warn that a critical window for monitoring AI reasoning may close forever as models learn to hide their thoughts.
As AI companies scale up training using reinforcement learning — where models get rewarded for correct outputs regardless of their methods — systems may drift away from human-readable reasoning toward more efficient but opaque internal languages. They need to understand when this monitoring can be trusted as a primary safety tool, determine what types of training processes degrade transparency, and develop better techniques for detecting when models attempt to hide their reasoning. Whether chain of thought monitoring proves to be a lasting safety tool or a brief glimpse into minds that quickly learn to obscure themselves may determine how safely humanity navigates the age of artificial intelligence.
Or read this on Venture Beat