Get the latest tech news

New technique helps LLMs rein in CoT lengths, optimizing reasoning without exploding compute costs

Carnegie Mellon University researchers propose a new LLM training technique that gives developers more control over chain-of-thought length.

Called length controlled policy optimization (LCPO), the technique conditions the model to provide correct answers while also keeping its “thoughts” within a predetermined token budget. However, the evaluation included math problems as well as out-of-distribution tasks such as the measuring massive multitask language understanding ( MMLU) technique and the graduate-level Google-proof Q&A benchmark ( GPQA). It’s a powerful alternative to simply deploying larger, more expensive models — and could be a crucial factor in making AI more economically viable for high-volume, real-world applications.

Get the Android app

Or read this on Venture Beat