Get the latest tech news

This website lets you blind-test GPT-5 vs. GPT-4o—and the results may surprise you

Take this blind test to discover whether you truly prefer OpenAI's GPT-5 or the older GPT-4o—without knowing which model you're using.

It achieves 94.6% accuracy on the AIME 2025 mathematics test compared to GPT-4o’s 71%, scores 74.9% on real-world coding benchmarks versus 30.8% for its predecessor, and demonstrates dramatically reduced hallucination rates—80% fewer factual errors when using its reasoning mode. In response to the backlash, OpenAI announced it would make GPT-5 “warmer and friendlier,” while simultaneously introducing four new preset personalities — Cynic, Robot, Listener, and Nerd — designed to give users more control over their AI interactions. OpenAI has made the model “warmer” in response to feedback, but the company faces a delicate balance: too much personality risks the sycophancy problems that plagued GPT-4o, while too little alienates users who had formed genuine attachments to their AI companions.

Get the Android app

Or read this on Venture Beat