Get the latest tech news

Evaluating GPT5's reasoning ability using the Only Connect game show

We tested the latest GPT-5 models against reasoning benchmarks using Only Connect challenges to measure pattern recognition, lateral thinking, and multi-step inference capabilities beyond traditional knowledge-based tests.

For rounds requiring straight guesses, we provided LLMs with all available clues and used structured output parameters to receive JSON responses. We'll publish the complete dataset this week alongside a granular analysis identifying which questions posed the greatest challenges for models. We'll also implement a more realistic competitive format, pairing models against each other and allowing points for correctly answering questions opponents miss.

Get the Android app

Or read this on Hacker News