Get the latest tech news
Deep Think with Confidence
DeepConf: Scaling LLM Reasoning with Confidence, Not Just Compute
By intelligently filtering out low-quality reasoning and enabling early stopping, DeepConf achieves state-of-the-art accuracy (e.g., 99.9% on the AIME 2025 benchmark with GPT-OSS-120B) while dramatically reducing the number of generated tokens—by up to 84.7%. Furthermore, extensive ablation studies in the paper's appendix confirm the method's robustness, showing that performance gains are not overly sensitive to the specific size of the warmup set or filtering percentage. By shifting the focus from brute-force generation to confidence-guided selection and early termination, it delivers a rare combination of improved accuracy and vastly superior computational efficiency.
Or read this on Hacker News