Get the latest tech news
New technique helps LLMs rein in CoT lengths, optimizing reasoning without exploding compute costs
Carnegie Mellon University researchers propose a new LLM training technique that gives developers more control over chain-of-thought length.
Called length controlled policy optimization (LCPO), the technique conditions the model to provide correct answers while also keeping its “thoughts” within a predetermined token budget. However, the evaluation included math problems as well as out-of-distribution tasks such as the measuring massive multitask language understanding ( MMLU) technique and the graduate-level Google-proof Q&A benchmark ( GPQA). It’s a powerful alternative to simply deploying larger, more expensive models — and could be a crucial factor in making AI more economically viable for high-volume, real-world applications.
Or read this on Venture Beat