Get the latest tech news

OpenAI Has Trained Its LLM To Confess To Bad Behavior


An anonymous reader quotes a report from MIT Technology Review: OpenAI is testing another new way to expose the complicated processes at work inside large language models. Researchers at the company can make an LLM produce what they call a confession, in which the model explains how it carried out a...

None

Get the Android app

Or read this on Slashdot

Read more on:

Photo of OpenAI

OpenAI

Photo of LLM

LLM

Photo of bad behavior

bad behavior

Related news:

News photo

OpenAI has trained its LLM to confess to bad behavior

News photo

Enthusiasm for OpenAI’s Sora Fades After Initial Creative Burst

News photo

OpenAI Calls a ‘Code Red’ + Which Model Should I Use? + The Hard Fork Review of Slop