Get the latest tech news
The AI industry is obsessed with Chatbot Arena, but it might not be the best benchmark
LMSYS' Chatbot Arena is perhaps the most popular AI benchmark today -- and an industry obsession. But it's far from a perfect measure.
LMSYS’ creators felt similarly, and so they devised an alternative: Chatbot Arena, a crowdsourced benchmark designed to capture the “nuanced” aspects of models and their performance on open-ended, real-world tasks. Yuchen Lin, a research scientist at the non-profit Allen Institute for AI, says that LMSYS hasn’t been completely transparent about the model capabilities, knowledge and skills it’s assessing on Chatbot Arena. Lending credence to his theory, the top questions in the LMSYS-Chat-1M data set pertain to programming, AI tools, software bugs and fixes, and app design — not the sorts of things you’d expect non-technical people to ask about.
Or read this on TechCrunch