Get the latest tech news

DeepSeek-R1-Distill-Qwen-1.5B Surpasses GPT-4o in certain benchmarks


We’re on a journey to advance and democratize artificial intelligence through open source and open science.

DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. NOTE: We recommend setting an appropriate temperature (between 0.5 and 0.7) when running these models, otherwise you may encounter issues with endless repetition or incoherent output.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of surpasses

surpasses

Photo of certain benchmarks

certain benchmarks

Photo of Qwen-1.5B

Qwen-1.5B

Related news:

News photo

DeepSeek claims its ‘reasoning’ model beats OpenAI’s o1 on certain benchmarks

News photo

Apple’s new AI model ReALM ‘surpasses GPT-4’

News photo

AI Surpasses Doctors In Spotting Early Breast Cancer Signs In NHS Trial