Get the latest tech news

DeepMind’s GenRM improves LLM accuracy by having models verify their own outputs


DeepMind's GenRM trains LLMs to verify responses based on next-token prediction and chain-of-thought (CoT) reasoning.

In a new paper, researchers at Google DeepMind, University of Toronto, Mila, and UCLA introduce GenRM, a novel approach that leverages the generative capabilities of LLMs to create more effective verifiers. In reasoning domains, LLM-based verifiers are typically trained as discriminative reward models (RMs) to assign numerical scores to candidate solutions, which are then used to classify them as correct or incorrect. To evaluate GenRM’s effectiveness, the DeepMind researchers tested it on several reasoning tasks, including last letter concatenation, word sorting, and word-math problems.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of DeepMind

DeepMind

Photo of Models

Models

Photo of LLM

LLM

Related news:

News photo

OpenAI and Anthropic agree to send models to US Government for safety evaluation

News photo

OpenAI and Anthropic agree to share their models with the US AI Safety Institute

News photo

OpenAI and Anthropic will share their models with the US government