Get the latest tech news

Notes on the New Deepseek v3


In this blog we go thorough the new Deepseek v3 and compare it with GPT-4o and 3.5 Sonnet across reasoning, math, coding, & writing tasks.

• Their success stems from breakthrough engineering: using MoE architecture, implementing FP8 mixed precision training, and developing a custom HAI-LLM framework. • They employ Multi-head Latent Attention (MLA), which compresses the Key-Value cache, reducing memory usage and enabling more efficient training. To better understand how they compare, I tested all three models using my set of benchmark questions, focusing on four key areas: reasoning, math, coding, and creative writing.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of notes

notes

Photo of New Deepseek v3

New Deepseek v3

Related news:

News photo

Notes on China

News photo

EmacsConf 2024 Notes

News photo

Kobo’s Elipsa 2E, an excellent e-reader for taking notes, is down to its best price