Get the latest tech news
How DeepSeek-R1 Was Built, for Dummies
Learn how DeepSeek achieved OpenAI o1-level reasoning with pure RL and solved issues through multi-stage training.
While OpenAI kept their methods under wraps, DeepSeek is taking the opposite approach — sharing their progress openly and earning praise for staying true to the open-source mission. Example: Train a model on general text data, then refine it with reinforcement learning on user feedback to improve its conversational abilities. If the labeled data is incomplete, biased, or doesn’t cover the full range of tasks, the critic can only provide feedback within those constraints — and it won’t generalize well.
Or read this on Hacker News