Get the latest tech news

OpenAI Threatens to Ban Users Who Probe Its ‘Strawberry’ AI Models


If you try to figure out how OpenAI’s o1 models solve problems, you might get a nastygram.

Nothing is more enticing to enthusiasts than information obscured, so the race has been on among hackers and red-teamers to try to uncover o1's raw chain of thought using jailbreaking or prompt injection techniques that attempt to trick the model into spilling its secrets. One X user reported(confirmed by others, including Scale AI prompt engineer Riley Goodside) that they received a warning email if they used the term "reasoning trace" in conversation with o1. Marco Figueroa, who manages Mozilla's GenAI bug bounty programs, was one of the first to post about the OpenAI warning email on X last Friday, complaining that it hinders his ability to do positive red-teaming safety research on the model.

Get the Android app

Or read this on Wired

Read more on:

Photo of OpenAI

OpenAI

Photo of users

users

Photo of Models

Models

Related news:

News photo

OpenAI's 'independent' oversight board isn't independent

News photo

OpenAI’s new model is better at reasoning and, occasionally, deceiving

News photo

Sam Altman steps down as head of OpenAIs safety group