Get the latest tech news
Notes on the New Deepseek v3
In this blog we go thorough the new Deepseek v3 and compare it with GPT-4o and 3.5 Sonnet across reasoning, math, coding, & writing tasks.
• Their success stems from breakthrough engineering: using MoE architecture, implementing FP8 mixed precision training, and developing a custom HAI-LLM framework. • They employ Multi-head Latent Attention (MLA), which compresses the Key-Value cache, reducing memory usage and enabling more efficient training. To better understand how they compare, I tested all three models using my set of benchmark questions, focusing on four key areas: reasoning, math, coding, and creative writing.
Or read this on Hacker News