Get the latest tech news

Nvidia’s new Llama-3.1 Nemotron Ultra outperforms DeepSeek R1 at half the size


Compared to DeepSeek R1, Llama-3.1-Nemotron-Ultra-253B shows competitive results despite having less than half the parameters.

Its architecture—customized through a Neural Architecture Search (NAS) process—introduces structural variations such as skipped attention layers, fused feedforward networks (FFNs), and variable FFN compression ratios. This included supervised fine-tuning across domains such as math, code generation, chat, and tool use, followed by reinforcement learning with Group Relative Policy Optimization (GRPO) to further boost instruction-following and reasoning performance. These results suggest that despite being a dense model, Nvidia’s offering matches or exceeds MoE alternatives on reasoning and general instruction alignment tasks, while trailing slightly in math-heavy categories.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of Nvidia

Nvidia

Photo of size

size

Photo of DeepSeek R1

DeepSeek R1

Related news:

News photo

$115 million just poured into this startup that makes engineering 1,000x faster — and Bezos, Altman, and Nvidia are all betting on its success

News photo

Engineering Software Startup Rescale Raises $115 Mil From Applied Materials, Nvidia

News photo

Nvidia, Cassava’s AI Factory in Africa Tie-Up to Cost $720 Million