Get the latest tech news
OpenAI Threatens to Ban Users Who Probe Its ‘Strawberry’ AI Models
If you try to figure out how OpenAI’s o1 models solve problems, you might get a nastygram.
Nothing is more enticing to enthusiasts than information obscured, so the race has been on among hackers and red-teamers to try to uncover o1's raw chain of thought using jailbreaking or prompt injection techniques that attempt to trick the model into spilling its secrets. One X user reported(confirmed by others, including Scale AI prompt engineer Riley Goodside) that they received a warning email if they used the term "reasoning trace" in conversation with o1. Marco Figueroa, who manages Mozilla's GenAI bug bounty programs, was one of the first to post about the OpenAI warning email on X last Friday, complaining that it hinders his ability to do positive red-teaming safety research on the model.
Or read this on Wired