Get the latest tech news

DeepSeek-R1-Distill-Qwen-1.5B Surpasses GPT-4o in certain benchmarks

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. NOTE: We recommend setting an appropriate temperature (between 0.5 and 0.7) when running these models, otherwise you may encounter issues with endless repetition or incoherent output.

Get the Android app

Or read this on Hacker News