Get the latest tech news

Google DeepMind researchers introduce new benchmark to improve LLM factuality, reduce hallucinations


Based on a new benchmark, Google DeepMind found Gemini 2.0 Flash to be the most factual LLM, with a score of 83.6%.

It’s a challenge data scientists have struggled to overcome, and now, researchers from Google DeepMind say they have come a step closer to achieving true factuality in foundation models. Ensuring factual accuracy in LLM responses is difficult because of modeling (architecture, training and inference) and measuring (evaluation methodologies, data and metrics) factors. To address this, the FACTS dataset incorporates 1,719 examples — 860 public and 859 private — each requiring long-form responses based on context in provided documents.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of Google DeepMind

Google DeepMind

Photo of hallucinations

hallucinations

Photo of new benchmark

new benchmark

Related news:

News photo

Meta proposes new scalable memory layers that improve knowledge, reduce hallucinations

News photo

Google DeepMind Unveils a New Video Model To Rival Sora

News photo

Google DeepMind unveils a new video model to rival Sora