Get the latest tech news

Deep Think with Confidence


DeepConf: Scaling LLM Reasoning with Confidence, Not Just Compute

By intelligently filtering out low-quality reasoning and enabling early stopping, DeepConf achieves state-of-the-art accuracy (e.g., 99.9% on the AIME 2025 benchmark with GPT-OSS-120B) while dramatically reducing the number of generated tokens—by up to 84.7%. Furthermore, extensive ablation studies in the paper's appendix confirm the method's robustness, showing that performance gains are not overly sensitive to the specific size of the warmup set or filtering percentage. By shifting the focus from brute-force generation to confidence-guided selection and early termination, it delivers a rare combination of improved accuracy and vastly superior computational efficiency.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Deep

Deep

Photo of confidence

confidence

Related news:

News photo

Senior AI researchers desert Apple amid ‘a crisis of confidence’

News photo

Still Wakes the Deep studio The Chinese Room goes independent to continue focus on original projects

News photo

Confidence in agentic AI: Why eval infrastructure must come first