Get the latest tech news

Deep Think with Confidence

DeepConf: Scaling LLM Reasoning with Confidence, Not Just Compute

By intelligently filtering out low-quality reasoning and enabling early stopping, DeepConf achieves state-of-the-art accuracy (e.g., 99.9% on the AIME 2025 benchmark with GPT-OSS-120B) while dramatically reducing the number of generated tokens—by up to 84.7%. Furthermore, extensive ablation studies in the paper's appendix confirm the method's robustness, showing that performance gains are not overly sensitive to the specific size of the warmup set or filtering percentage. By shifting the focus from brute-force generation to confidence-guided selection and early termination, it delivers a rare combination of improved accuracy and vastly superior computational efficiency.

Get the Android app

Or read this on Hacker News