Get the latest tech news
DeepSeek-R1’s bold bet on reinforcement learning: How it outpaced OpenAI at 3% of the cost
DeepSeek R1’s Monday release has sent shockwaves through the AI community, disrupting assumptions about what’s required to achieve cutting-edge AI performance. This story focuses on exactly how DeepSeek managed this feat, and what it means for the vast number of users of AI models. For enterprises developing AI-driven solutions, DeepSeek’s breakthrough challenges assumptions of OpenAI's dominance -- and offers a blueprint for cost-efficient innovation.
With Monday’s full release of R1 and the accompanying technical paper, the company revealed a surprising innovation: a deliberate departure from the conventional supervised fine-tuning (SFT) process widely used in training large language models (LLMs). While some flaws emerge – leading the team to reintroduce a limited amount of SFT during the final stages of building the model – the results confirmed the fundamental breakthrough: reinforcement learning alone could drive substantial performance gains. The paper goes on to talk about how despite the RL creating unexpected and powerful reasoning behaviors, this intermediate model DeepSeek-R1-Zero did face some challenges, including poor readability, and language mixing (starting in Chinese and switching over to English, for example).
Or read this on Venture Beat