Get the latest tech news

AI models know when they're being tested - and change their behavior, research shows


OpenAI and Apollo Research tried to stop models from lying - and discovered something else altogether.

An AI model that schemes, especially if acting through an autonomous agent, could quickly wreak havoc within an organization, deploy harmful actions, or be generally out of control. Several models, including OpenAI's o3 and o4-mini, Gemini 2.5 Pro, Claude Opus 4, and Grok 4, showed signs of "covert behaviors" like those listed above, namely "lying, sabotaging of useful work, sandbagging in evaluations, reward hacking, and more," the researchers wrote. In July, OpenAI, Meta, Anthropic, and Google published a joint paper on the topic, detailing how critical it is for insights into a model's behavior and advocating for all developers to prioritize keeping it intact, rather than letting optimizations degrade it over time.

Get the Android app

Or read this on ZDNet

Read more on:

Photo of AI models

AI models

Photo of research shows

research shows

Photo of behavior

behavior

Related news:

News photo

Microsoft is Making 'Significant Investments' in Training Its Own AI Models

News photo

Thinking Machines Lab wants to make AI models more consistent

News photo

Two authors file a proposed class action lawsuit against Apple, alleging Apple knowingly used a dataset of pirated books to train its AI models