Get the latest tech news

Contemplative LLMs

Let the LLM 'contemplate' before answering

Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. The main motivation behind creating this prompt was by just looking at the raw Chain Of Thought(CoT) text of o1 from the official blog post. Now, instead asking the model to respond in JSON we use XML tags to separate the start and end of the contemplation phase and the final answer:

Get the Android app

Or read this on Hacker News