Get the latest tech news

Google DeepMind researchers introduce new benchmark to improve LLM factuality, reduce hallucinations

Based on a new benchmark, Google DeepMind found Gemini 2.0 Flash to be the most factual LLM, with a score of 83.6%.

It’s a challenge data scientists have struggled to overcome, and now, researchers from Google DeepMind say they have come a step closer to achieving true factuality in foundation models. Ensuring factual accuracy in LLM responses is difficult because of modeling (architecture, training and inference) and measuring (evaluation methodologies, data and metrics) factors. To address this, the FACTS dataset incorporates 1,719 examples — 860 public and 859 private — each requiring long-form responses based on context in provided documents.

Get the Android app

Or read this on Venture Beat