Get the latest tech news
DeepMind’s GenRM improves LLM accuracy by having models verify their own outputs
DeepMind's GenRM trains LLMs to verify responses based on next-token prediction and chain-of-thought (CoT) reasoning.
In a new paper, researchers at Google DeepMind, University of Toronto, Mila, and UCLA introduce GenRM, a novel approach that leverages the generative capabilities of LLMs to create more effective verifiers. In reasoning domains, LLM-based verifiers are typically trained as discriminative reward models (RMs) to assign numerical scores to candidate solutions, which are then used to classify them as correct or incorrect. To evaluate GenRM’s effectiveness, the DeepMind researchers tested it on several reasoning tasks, including last letter concatenation, word sorting, and word-math problems.
Or read this on Venture Beat