Get the latest tech news
DeepMind and UC Berkeley shows how to make the most of LLM inference-time compute
A new study shows that given the right prompts LLM with a finite inference budgets can outperform larger pre-trained models.
Their findings, detailed in a new research paper, suggest that by optimizing the use of inference-time compute, LLMs can achieve substantial performance gains without the need for larger models or extensive pre-training. For easier problems, where the base LLM can already produce reasonable responses, allowing the model to iteratively refine its initial answer proved to be more effective than generating multiple samples in parallel. For more difficult problems that require exploring different solution strategies, they found that resampling multiple responses in parallel or deploying tree-search against a process-based reward model was more effective.
Or read this on Venture Beat