Get the latest tech news

Notes on the New Deepseek v3

In this blog we go thorough the new Deepseek v3 and compare it with GPT-4o and 3.5 Sonnet across reasoning, math, coding, & writing tasks.

• Their success stems from breakthrough engineering: using MoE architecture, implementing FP8 mixed precision training, and developing a custom HAI-LLM framework. • They employ Multi-head Latent Attention (MLA), which compresses the Key-Value cache, reducing memory usage and enabling more efficient training. To better understand how they compare, I tested all three models using my set of benchmark questions, focusing on four key areas: reasoning, math, coding, and creative writing.

Get the Android app

Or read this on Hacker News