Get the latest tech news
Berkeley Researchers Replicate DeepSeek R1's Core Tech for Just $30: A Small Mod
A Berkeley AI Research team led by PhD candidate Jiayi Pan has achieved what many thought impossible: reproducing DeepSeek R1-Zero's key technologies for less than the cost of a dinner for two.
A Berkeley AI Research team led by PhD candidate Jiayi Pan has achieved what many thought impossible: reproducing DeepSeek R1-Zero's key technologies for less than the cost of a dinner for two. Using the countdown game as their testing ground, the team demonstrated that even modest language models can develop complex problem-solving strategies through reinforcement learning. Surprisingly, the choice of reinforcement learning algorithm (PPO, GRPO, or PRIME) proved less critical than expected, with all approaches achieving similar results.
Or read this on Hacker News