Get the latest tech news
Debates over AI benchmarking have reached Pokémon
Not even Pokémon is safe from AI benchmarking controversy. Last week, a post on X went viral, claiming that Google's latest Gemini model surpassed
Reportedly, Gemini had reached Lavendar Town in a developer’s Twitch stream; Claude was stuck at Mount Moon as of late February. As users on Reddit pointed out, the developer who maintains the Gemini stream built a custom minimap that helps the model identify “tiles” in the game like cuttable trees. Given that AI benchmarks — Pokémon included — are imperfect measures to begin with, custom and non-standard implementations threaten to muddy the waters even further.
Or read this on TechCrunch