Get the latest tech news

Evals in 2025: going beyond simple benchmarks to build models people can use

Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval! - huggingface/evaluation-guidebook

Get the Android app

Or read this on Hacker News

Related news:

Distillation Can Make AI Models Smaller and Cheaper

9 months after its 1.0 launch flopped, an indie dev just learned that Steam never emailed the 130,000 people who wishlisted its game

Three Die in Australia After Optus Emergency Calls Fail

« The dawn of the post-literate society – and the end of civilisation

Embrace hope, reject assisted suicide »