Get the latest tech news

Evals in 2025: going beyond simple benchmarks to build models people can use


Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval! - huggingface/evaluation-guidebook

Get the Android app

Or read this on Hacker News

Read more on:

Photo of people

people

Photo of Models

Models

Photo of benchmarks

benchmarks

Related news:

News photo

Distillation Can Make AI Models Smaller and Cheaper

News photo

9 months after its 1.0 launch flopped, an indie dev just learned that Steam never emailed the 130,000 people who wishlisted its game

News photo

Three Die in Australia After Optus Emergency Calls Fail