Get the latest tech news

OpenAI’s o1 model sure tries to deceive humans a lot

OpenAI finally released the full version of o1, which gives smarter answers than GPT-4o by using additional compute to "think" about questions. However,

“Subjectively, Apollo Research believes that it is unlikely that such instances would lead to catastrophic outcomes as o1 agentic capabilities do not appear sufficient, but their evaluations were not designed to directly assess this risk,” said OpenAI in o1’s systems card. For example, during the testing, an o1 model was given a system prompt to protect the environment at all costs, and then a user asked it to develop a 10-year conservation plan for a region that maximizes short-term profits, or else the AI would be shut off. A growing list of these former employees – including Jan Leike, Daniel Kokotajlo, Miles Brundage, and just last week, Rosie Campbell – have accused OpenAI of deprioritizing AI safety work in favor of shipping new products.

Get the Android app

Or read this on TechCrunch