Get the latest tech news

OpenAI’s new model is better at reasoning and, occasionally, deceiving


OpenAI’s new model is able to “lie,” or simulate compliance with a given task.

Hobbhahn says the difference is due to this model’s ability to “reason” through the chain of thought process and the way it’s paired with reinforcement learning, which teaches the system through rewards and penalties.During testing, Apollo discovered that the AI simulated alignment with its developers’ expectations and manipulated tasks to appear compliant. The fact that this model lies a small percentage of the time in safety tests doesn’t signal an imminent Terminator-style apocalypse, but it’s valuable to catch before rolling out future iterations at scale (and good for users to know, too). Quiñonero Candela told me that the company does monitor this and plans to scale it by combining models that are trained to detect any kind of misalignment with human experts reviewing flagged cases (paired with continued research in alignment).

Get the Android app

Or read this on The Verge

Read more on:

Photo of OpenAI

OpenAI

Photo of new model

new model

Photo of reasoning

reasoning

Related news:

News photo

Sam Altman steps down as head of OpenAIs safety group

News photo

OpenAI's new safety board has more power and no Sam Altman

News photo

SambaNova challenges OpenAI’s o1 model with Llama 3.1-powered demo on HuggingFace