Get the latest tech news

DeepMind and UC Berkeley shows how to make the most of LLM inference-time compute


A new study shows that given the right prompts LLM with a finite inference budgets can outperform larger pre-trained models.

Their findings, detailed in a new research paper, suggest that by optimizing the use of inference-time compute, LLMs can achieve substantial performance gains without the need for larger models or extensive pre-training. For easier problems, where the base LLM can already produce reasonable responses, allowing the model to iteratively refine its initial answer proved to be more effective than generating multiple samples in parallel. For more difficult problems that require exploring different solution strategies, they found that resampling multiple responses in parallel or deploying tree-search against a process-based reward model was more effective.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of DeepMind

DeepMind

Photo of uc berkeley

uc berkeley

Photo of time compute

time compute

Related news:

News photo

Google strikes licensing deal with Character AI and poaches top executives for DeepMind

News photo

DeepMind’s Gemma Scope peers under the hood of large language models

News photo

DeepMind makes big jump toward interpreting LLMs with sparse autoencoders