Get the latest tech news

Not every AI prompt deserves multiple seconds of thinking: how Meta is teaching models to prioritize


Let models explore different solutions and they will find optimal solutions to properly allocate inference budget to AI reasoning problems.

To alleviate this problem, the researchers propose “Inference Budget-Constrained Policy Optimization” (IBPO), a reinforcement learning algorithm that teaches the model to adjust the length of reasoning traces based on the difficulty of the query. The RL algorithm enables the model to surpass the gains obtained through training on manually labeled data by constantly generating ASV traces, evaluating the responses, and choosing outcomes that provide the correct answer and the optimal inference budget. IBPO (green circles) outperforms other baselines on the Pareto front (source: arXiv)The findings come against the backdrop of researchers warning that current AI models are hitting a wall.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of Meta

Meta

Photo of Models

Models

Photo of thinking

thinking

Related news:

News photo

Google launches new AI models and brings ‘thinking’ to Gemini

News photo

Apple's M4 iMac (8-Core/256GB) Drops to $1,149.99 on Amazon, More Models at Up to $175 Off

News photo

Q&A: Lucas Graves on Meta’s Decision to Shut Down Its Global Fact-Checking Program