Get the latest tech news
Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 — at 95% less cost
The company developed DeepSeek-R1 by using pure reinforcement learning on top of DeepSeek-V3-Base, and matched or beat o1 on some benchmarks.
DeepSeek-R1’s reasoning performance marks a big win for the Chinese startup in the US-dominated AI space, especially as the entire work is open-source, including how the company trained the whole thing. The company first used DeepSeek-V3-base as the base model, developing its reasoning capabilities without employing supervised data, essentially focusing only on its self-evolution through a pure RL-based trial-and-error process. Developed intrinsically from the work, this ability ensures the model can solve increasingly complex reasoning tasks by leveraging extended test-time computation to explore and refine its thought processes in greater depth.
Or read this on Venture Beat