Get the latest tech news

Tau² benchmark: How a prompt rewrite boosted GPT-5-mini by 22%


We expected small models to be fast, but our benchmarks revealed a common reliability trap. Here’s our deep dive on finding and fixing it.

To validate these claims, they’ve turned to the Tau² benchmark, which simulates real-world agent interactions across various domains like telecom, retail, and airlines. Smaller models struggle with long-winded or fuzzy policies, but thrive when given structured flows, binary decisions, and lightweight verification steps. With strategic optimization, lightweight models can deliver decent results at a fraction of the cost — making them a compelling alternative when efficiency and affordability matter as much as accuracy.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of mini

mini

Photo of GPT-5

GPT-5

Photo of prompt rewrite

prompt rewrite

Related news:

News photo

DJI's Mini Pro 5 drone is the first in the series with a 1-inch sensor

News photo

Addendum to GPT-5 system card: GPT-5-Codex

News photo

OpenAI upgrades Codex with a new version of GPT-5