Get the latest tech news
LiveBench is an open LLM benchmark that uses contamination-free test data and objective scoring
Yann LeCun and other researchers have developed LiveBench, an open AI benchmark evaluating models using challenging, contamination-free test data.
LiveBench utilizes “frequently updated questions from recent sources, scoring answers automatically according to objective ground-truth values, and contains a wide variety of challenging tasks spanning math, coding, reasoning, language, instruction following, and data analysis.” “This project started with the initial thought that we should build a benchmark where diverse questions are freshly generated every time we evaluate a mode, making test set contamination impossible. Moreover, humans could influence how questions are generated, offering less diverse queries, favoring specific topics that don’t probe a model’s general capabilities, or simply writing poorly constructed prompts.
Or read this on Venture Beat