Get the latest tech news

OpenAI's new confession system teaches models to be honest about bad behaviors


OpenAI is working on a framework that will train AI models to acknowledge when they've engaged in undesirable behavior.

None

Get the Android app

Or read this on Endgadget

Read more on:

Photo of OpenAI

OpenAI

Photo of Models

Models

Photo of bad behaviors

bad behaviors

Related news:

News photo

Anthropic taps IPO lawyers as it races OpenAI to go public

News photo

OpenAI Buys Polish Startup Neptune to Improve AI Modeling

News photo

Apple drops Night mode Portraits with iPhone 17 models