Get the latest tech news
Less is more: UC Berkeley and Google unlock LLM potential through simple sampling
With multiple sampling and self-verification, Gemini 1.5 Pro can outperform o1-preview in reasoning tasks.
The core finding is that even a minimalist implementation of sampling-based search, using random sampling and self-verification, can elevate the reasoning performance of models like Gemini 1.5 Pro beyond that of o1-Preview on popular benchmarks. The current popular method for test-time scaling in LLMs is to train the model through reinforcement learning to generate longer responses with chain-of-thought (CoT) traces. Sampling-based search offers a simpler and highly scalable alternative to test-time scaling: Let the model generate multiple responses and select the best one through a verification mechanism.
Or read this on Venture Beat