Get the latest tech news

OpenAI’s o3 AI model scores lower on a benchmark than the company initially implied


A discrepancy between first- and third-party benchmark results for OpenAI's o3 AI model is raising questions about the company's transparency and model testing practices.

When OpenAI unveiled o3 in December, the company claimed the model could answer just over a fourth of questions on FrontierMath, a challenging set of math problems. As it turns out, that figure was likely an upper bound, achieved by a version of o3 with more computing behind it than the model OpenAI publicly launched last week. Benchmarking “controversies” are becoming a common occurrence in the AI industry as vendors race to capture headlines and mindshare with new models.

Get the Android app

Or read this on TechCrunch

Read more on:

Photo of OpenAI

OpenAI

Photo of Company

Company

Photo of benchmark

benchmark

Related news:

News photo

Your politeness could be costly for OpenAI

News photo

OpenAI just gave itself wiggle room on safety if rivals release 'high-risk' models | Some former employees say OpenAI is scaling back safety promises to stay competitive.

News photo

Why is OpenAI buying Windsurf?