Get the latest tech news
Not every AI prompt deserves multiple seconds of thinking: how Meta is teaching models to prioritize
Let models explore different solutions and they will find optimal solutions to properly allocate inference budget to AI reasoning problems.
To alleviate this problem, the researchers propose “Inference Budget-Constrained Policy Optimization” (IBPO), a reinforcement learning algorithm that teaches the model to adjust the length of reasoning traces based on the difficulty of the query. The RL algorithm enables the model to surpass the gains obtained through training on manually labeled data by constantly generating ASV traces, evaluating the responses, and choosing outcomes that provide the correct answer and the optimal inference budget. IBPO (green circles) outperforms other baselines on the Pareto front (source: arXiv)The findings come against the backdrop of researchers warning that current AI models are hitting a wall.
Or read this on Venture Beat