Get the latest tech news
Meta’s surprise Llama 4 drop exposes the gap between AI ambition and reality
Touted 10M token context proves elusive, while early performance tests disappoint experts.
Meta claims that the larger of its two new Llama 4 models, Maverick, outperforms competitors like OpenAI's GPT-4o and Google's Gemini 2.0 on various technical benchmarks, which we usually note are not necessarily useful reflections of everyday user experience. However, even this comes with a catch: Willison noted a distinction pointed out in Meta's own announcement: the high-ranking leaderboard entry referred to an "experimental chat version scoring ELO of 1417 on LMArena," different from the Maverick model made available for download. Some Reddit users also noted it compared unfavorably with innovative competitors such as DeepSeek and Qwen, particularly highlighting its underwhelming performance in coding tasks and software development benchmarks.
Or read this on ArsTechnica