Get the latest tech news

Study Accuses LM Arena of Helping Top AI Labs Game Its Benchmark


An anonymous reader shares a report: A new paper from AI lab Cohere, Stanford, MIT, and Ai2 accuses LM Arena, the organization behind the popular crowdsourced AI benchmark Chatbot Arena, of helping a select group of AI companies achieve better leaderboard scores at the expense of rivals. According...

According to the authors, LM Arena allowed some industry-leading AI companies like Meta, OpenAI, Google, and Amazon to privately test several variants of AI models, then not publish the scores of the lowest performers. This made it easier for these companies to achieve a top spot on the platform's leaderboard, though the opportunity was not afforded to every firm, the authors say. "Only a handful of [companies] were told that this private testing was available, and the amount of private testing that some [companies] received is just so much more than others," said Cohere's VP of AI research and co-author of the study, Sara Hooker, in an interview with TechCrunch.

Get the Android app

Or read this on Slashdot

Read more on:

Photo of Study

Study

Photo of arena

arena

Photo of benchmark

benchmark

Related news:

News photo

Psilocybin an effective treatment for repeated concussions, study suggests | Psilocybin, the psychedelic ingredient in magic mushrooms, could be an effective treatment because of its brain-healing properties.

News photo

Study accuses LM Arena of helping top AI labs game its benchmark

News photo

Study finds that budget cuts to public R&D would significantly hurt the economy