Get the latest tech news
Meta’s benchmarks for its new AI models are a bit misleading
Meta appears to have used an unreleased, custom version of one of its new flagship AI models, Maverick, to boost a benchmark score.
As severalAIresearchers pointed out on X, Meta noted in its announcement that the Maverick on LM Arena is an “experimental chat version.” A chart on the official Llama website, meanwhile, discloses that Meta’s LM Arena testing was conducted using “Llama 4 Maverick optimized for conversationality.” Ideally, benchmarks — woefully inadequate as they are — provide a snapshot of a single model’s strengths and weaknesses across a range of tasks. Indeed, researchers on X have observed starkdifferences in the behavior of the publicly downloadable Maverick compared with the model hosted on LM Arena.
Or read this on TechCrunch