Get the latest tech news

The AI industry is obsessed with Chatbot Arena, but it might not be the best benchmark


LMSYS' Chatbot Arena is perhaps the most popular AI benchmark today -- and an industry obsession. But it's far from a perfect measure.

LMSYS’ creators felt similarly, and so they devised an alternative: Chatbot Arena, a crowdsourced benchmark designed to capture the “nuanced” aspects of models and their performance on open-ended, real-world tasks. Yuchen Lin, a research scientist at the non-profit Allen Institute for AI, says that LMSYS hasn’t been completely transparent about the model capabilities, knowledge and skills it’s assessing on Chatbot Arena. Lending credence to his theory, the top questions in the LMSYS-Chat-1M data set pertain to programming, AI tools, software bugs and fixes, and app design — not the sorts of things you’d expect non-technical people to ask about.

Get the Android app

Or read this on TechCrunch

Read more on:

Photo of AI industry

AI industry

Photo of chatbot arena

chatbot arena

Photo of best benchmark

best benchmark

Related news:

News photo

GitHub CEO Thomas Dohmke says the AI industry needs competition to thrive

News photo

How Salesforce’s MINT-1T dataset could disrupt the AI industry

News photo

OpenAI's newly released GPT-4o mini dominates the Chatbot Arena. Here's why.