Get the latest tech news

Replicating Deepseek-R1 for $4500: RL Boosts 1.5B Model Beyond o1-preview


Democratizing Reinforcement Learning for LLMs. Contribute to agentica-project/deepscaler development by creating an account on GitHub.

DeepScaleR is an open-source project to fully democratize reinforcement learning (RL) for LLMs and reproduce DeepSeek R1 and OpenAI O1/O3 at scale on real tasks. For all releases, we open source all our efforts here-including training scripts (including hyperparameters), models, dataset, and logs. We achieve this by iteratively scaling Deepseek's GRPO algorithm from 8K→16K->24K context length for thinking.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of preview

preview

Photo of 1.5b model

1.5b model

Related news:

News photo

South of Midnight preview: Stopped motion

News photo

Lenovo Legion Go S review: a preview of the first AMD Z2-powered gaming handhelds

News photo

Windows 11 24H2 preview brings new taskbar features