Get the latest tech news

Hugging Face shows how test-time scaling helps small language models punch above their weight


Given enough time to "think," small language models can beat LLMs at math and coding tasks by generating and verifying multiple answers.

Image source: Hugging Face The work is inspired by OpenAI o1, which uses extra “thinking” to solve complex math, coding and reasoning problems. Image source: Hugging Face The simplest way to use test-time scaling is “majority voting,” in which the same prompt is sent to the model multiple times and the highest-voted is chosen. For example, if you are short on memory or can tolerate slower response times, you can use a small model and spend more inference-time cycles to generate more accurate answers.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of weight

weight

Photo of Hugging Face

Hugging Face

Photo of time scaling

time scaling

Related news:

News photo

Hugging Face’s SmolVLM could cut AI costs for businesses by a huge margin

News photo

OpenAI Sora allegedly leaked to Hugging Face

News photo

South Korea court convicts man for dodging military draft by gaining weight