Get the latest tech news

How DeepSeek-R1 Was Built, for Dummies

Learn how DeepSeek achieved OpenAI o1-level reasoning with pure RL and solved issues through multi-stage training.

While OpenAI kept their methods under wraps, DeepSeek is taking the opposite approach — sharing their progress openly and earning praise for staying true to the open-source mission. Example: Train a model on general text data, then refine it with reinforcement learning on user feedback to improve its conversational abilities. If the labeled data is incomplete, biased, or doesn’t cover the full range of tasks, the critic can only provide feedback within those constraints — and it won’t generalize well.

Get the Android app

Or read this on Hacker News