Get the latest tech news

Less is more: UC Berkeley and Google unlock LLM potential through simple sampling

With multiple sampling and self-verification, Gemini 1.5 Pro can outperform o1-preview in reasoning tasks.

The core finding is that even a minimalist implementation of sampling-based search, using random sampling and self-verification, can elevate the reasoning performance of models like Gemini 1.5 Pro beyond that of o1-Preview on popular benchmarks. The current popular method for test-time scaling in LLMs is to train the model through reinforcement learning to generate longer responses with chain-of-thought (CoT) traces. Sampling-based search offers a simpler and highly scalable alternative to test-time scaling: Let the model generate multiple responses and select the best one through a verification mechanism.

Get the Android app

Or read this on Venture Beat