Get the latest tech news

An analysis of DeepSeek's R1-Zero and R1

An analysis of Deepseek's R1

We launched ARC Prize 2024 last June to grow awareness of limits of scaling LLMs and promote a useful benchmark, ARC-AGI-1, towards a new direction that requires AI systems to adapt to novel, unseen problems instead of being able to rely strictly on memorization. Label the intermediary CoT steps using a combination of human experts (“supervised fine tuning” or SFT) and automated machines (“reinforcement learning” or RL). It is important to watch whether SFT ends up being a requirement to add CoT search and sampling, or whether a hypothetical “R2-Zero” could exist along the same logarithmic accuracy vs inference scaling curve.

Get the Android app

Or read this on Hacker News