Get the latest tech news

Debates over AI benchmarking have reached Pokémon


Not even Pokémon is safe from AI benchmarking controversy. Last week, a post on X went viral, claiming that Google's latest Gemini model surpassed

Reportedly, Gemini had reached Lavendar Town in a developer’s Twitch stream; Claude was stuck at Mount Moon as of late February. As users on Reddit pointed out, the developer who maintains the Gemini stream built a custom minimap that helps the model identify “tiles” in the game like cuttable trees. Given that AI benchmarks — Pokémon included — are imperfect measures to begin with, custom and non-standard implementations threaten to muddy the waters even further.

Get the Android app

Or read this on TechCrunch

Read more on:

Photo of Pokémon

Pokémon

Photo of debates

debates

Photo of AI benchmarking

AI benchmarking

Related news:

News photo

Claude isn’t a great Pokémon player, and that’s okay

News photo

Why Anthropic's Claude still hasn't beaten Pokémon

News photo

Pokémon Go Maker Niantic Sells Unit to Saudi Fund for $3.5 Billion