Get the latest tech news

LiveBench is an open LLM benchmark that uses contamination-free test data and objective scoring


Yann LeCun and other researchers have developed LiveBench, an open AI benchmark evaluating models using challenging, contamination-free test data.

LiveBench utilizes “frequently updated questions from recent sources, scoring answers automatically according to objective ground-truth values, and contains a wide variety of challenging tasks spanning math, coding, reasoning, language, instruction following, and data analysis.” “This project started with the initial thought that we should build a benchmark where diverse questions are freshly generated every time we evaluate a mode, making test set contamination impossible. Moreover, humans could influence how questions are generated, offering less diverse queries, favoring specific topics that don’t probe a model’s general capabilities, or simply writing poorly constructed prompts.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of LLM

LLM

Photo of contamination

contamination

Photo of free test data

free test data

Related news:

News photo

ASUS quietly built supercomputers, datacenters and an LLM. Now it's quietly selling them all together

News photo

A beginners guide to fine tuning LLM using LoRA

News photo

YaFSDP: a sharded data parallelism framework, faster for pre-training LLMs