Get the latest tech news

OpenAI’s o3 AI model scores lower on a benchmark than the company initially implied

A discrepancy between first- and third-party benchmark results for OpenAI's o3 AI model is raising questions about the company's transparency and model testing practices.

When OpenAI unveiled o3 in December, the company claimed the model could answer just over a fourth of questions on FrontierMath, a challenging set of math problems. As it turns out, that figure was likely an upper bound, achieved by a version of o3 with more computing behind it than the model OpenAI publicly launched last week. Benchmarking “controversies” are becoming a common occurrence in the AI industry as vendors race to capture headlines and mindshare with new models.

Get the Android app

Or read this on TechCrunch