Get the latest tech news

LiveBench is an open LLM benchmark that uses contamination-free test data and objective scoring

Yann LeCun and other researchers have developed LiveBench, an open AI benchmark evaluating models using challenging, contamination-free test data.

LiveBench utilizes “frequently updated questions from recent sources, scoring answers automatically according to objective ground-truth values, and contains a wide variety of challenging tasks spanning math, coding, reasoning, language, instruction following, and data analysis.” “This project started with the initial thought that we should build a benchmark where diverse questions are freshly generated every time we evaluate a mode, making test set contamination impossible. Moreover, humans could influence how questions are generated, offering less diverse queries, favoring specific topics that don’t probe a model’s general capabilities, or simply writing poorly constructed prompts.

Get the Android app

Or read this on Venture Beat