Get the latest tech news

New technique helps LLMs rein in CoT lengths, optimizing reasoning without exploding compute costs


Carnegie Mellon University researchers propose a new LLM training technique that gives developers more control over chain-of-thought length.

Called length controlled policy optimization (LCPO), the technique conditions the model to provide correct answers while also keeping its “thoughts” within a predetermined token budget. However, the evaluation included math problems as well as out-of-distribution tasks such as the measuring massive multitask language understanding ( MMLU) technique and the graduate-level Google-proof Q&A benchmark ( GPQA). It’s a powerful alternative to simply deploying larger, more expensive models — and could be a crucial factor in making AI more economically viable for high-volume, real-world applications.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of LLMs

LLMs

Photo of reasoning

reasoning

Photo of New technique

New technique

Related news:

News photo

Asking LLMs to create my game Shepard's Dog

News photo

People are just as bad as my LLMs

News photo

From Prompt to Adventures:Creating Games with LLMs and Restate Durable Functions