Get the latest tech news
DeepSeek-R1-Distill-Qwen-1.5B Surpasses GPT-4o in certain benchmarks
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. NOTE: We recommend setting an appropriate temperature (between 0.5 and 0.7) when running these models, otherwise you may encounter issues with endless repetition or incoherent output.
Or read this on Hacker News