Get the latest tech news
Evals in 2025: going beyond simple benchmarks to build models people can use
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval! - huggingface/evaluation-guidebook
Or read this on Hacker News