Get the latest tech news
OpenAI tested GPT-5, Claude, and Gemini on real-world tasks - the results were surprising
Here are the best models for aesthetics, accuracy, and more, according to OpenAI's new GDPval test.
OpenAIAccording to OpenAI, the tasks were created by professionals with an average of 14 years of experience in relevant fields to reflect "real work products, such as a legal brief, an engineering blueprint, a customer support conversation, or a nursing care plan." The company added that future iterations will try, though, by spanning more industries and harder-to-automate tasks, like those that involve interactive workflows or lots of prior context (something AI agents, for example, currently struggle with). Despite noting how competitive models have become with human experts, OpenAI reiterated its familiar line: that it plans to democratize access to AI tools in order to keep "supporting workers through change, and building systems that reward broad contribution."
Or read this on ZDNet