Get the latest tech news

OpenAI tested GPT-5, Claude, and Gemini on real-world tasks - the results were surprising


Here are the best models for aesthetics, accuracy, and more, according to OpenAI's new GDPval test.

OpenAIAccording to OpenAI, the tasks were created by professionals with an average of 14 years of experience in relevant fields to reflect "real work products, such as a legal brief, an engineering blueprint, a customer support conversation, or a nursing care plan." The company added that future iterations will try, though, by spanning more industries and harder-to-automate tasks, like those that involve interactive workflows or lots of prior context (something AI agents, for example, currently struggle with). Despite noting how competitive models have become with human experts, OpenAI reiterated its familiar line: that it plans to democratize access to AI tools in order to keep "supporting workers through change, and building systems that reward broad contribution."

Get the Android app

Or read this on ZDNet

Read more on:

Photo of OpenAI

OpenAI

Photo of results

results

Photo of gemini

gemini

Related news:

News photo

xAI accuses OpenAI of stealing its trade secrets in new lawsuit

News photo

I ran DeepSeek on this mini PC from Amazon, and the results surprised me

News photo

CoreWeave Expands OpenAI Deals to as Much as $22.4 Billion