Get the latest tech news

OpenAI tested GPT-5, Claude, and Gemini on real-world tasks - the results were surprising

Here are the best models for aesthetics, accuracy, and more, according to OpenAI's new GDPval test.

OpenAIAccording to OpenAI, the tasks were created by professionals with an average of 14 years of experience in relevant fields to reflect "real work products, such as a legal brief, an engineering blueprint, a customer support conversation, or a nursing care plan." The company added that future iterations will try, though, by spanning more industries and harder-to-automate tasks, like those that involve interactive workflows or lots of prior context (something AI agents, for example, currently struggle with). Despite noting how competitive models have become with human experts, OpenAI reiterated its familiar line: that it plans to democratize access to AI tools in order to keep "supporting workers through change, and building systems that reward broad contribution."

Get the Android app

Or read this on ZDNet