Get the latest tech news

The Illustrated DeepSeek-R1

A recipe for reasoning LLMs

It is significant not because it’s a great LLM to use, but because creating it required so little labeled data alongside large-scale reinforcement learning resulting in a model that excels at solving reasoning problems. Write python code that takes a list of numbers, returns them in a sorted order, but also adds 42 at the start. If you’re new to the concept of Supervised Fine-Tuning (SFT), that is the process that presents the model with training examples in the form of prompt and correct completion.

Get the Android app

Or read this on Hacker News