Get the latest tech news

Q*: Improving Multi-Step Reasoning for LLMs with Deliberative Planning


Large Language Models (LLMs) have demonstrated impressive capability in many nature language tasks. However, the auto-regressive generation process makes LLMs prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning. In this paper, we aim to alleviate the pathology by introducing Q*, a general, versatile and agile framework for guiding LLMs decoding process with deliberative planning. By learning a plug-and-play Q-value model as heuristic function, our Q* can effectively guide LLMs to select the most promising next step without fine-tuning LLMs for each task, which avoids the significant computational overhead and potential risk of performance degeneration on other tasks. Extensive experiments on GSM8K, MATH and MBPP confirm the superiority of our method.

View a PDF of the paper titled Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning, by Chaojie Wang and 4 other authors However, the auto-regressive generation process makes LLMs prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning. In this paper, we aim to alleviate the pathology by introducing Q*, a general, versatile and agile framework for guiding LLMs decoding process with deliberative planning.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of LLMs

LLMs

Photo of multi-step reasoning

multi-step reasoning

Related news:

News photo

Slack Combines ASTs with LLMs to Automatically Convert 80% of 15,000 Unit Tests

News photo

Every Way to Get Structured Output from LLMs

News photo

Show HN: Token price calculator for 400+ LLMs