Get the latest tech news
Contemplative LLMs
Let the LLM 'contemplate' before answering
Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. The main motivation behind creating this prompt was by just looking at the raw Chain Of Thought(CoT) text of o1 from the official blog post. Now, instead asking the model to respond in JSON we use XML tags to separate the start and end of the contemplation phase and the final answer:
Or read this on Hacker News