Get the latest tech news

DeepSeek's AI breakthrough bypasses industry-standard CUDA, uses PTX


Dramatic optimizations do not come easy.

The breakthrough was achieved by implementing tons of fine-grained optimizations and usage of assembly-like PTX (Parallel Thread Execution) programming instead of Nvidia's CUDA, according to an analysis from Mirae Asset Securities Korea cited by@Jukanlosreve. PTX is a close-to-metal ISA that exposes the GPU as a data-parallel computing device and, therefore, allows fine-grained optimizations, such as register allocation and thread/warp-level adjustments, something that CUDA C/C++ and other languages cannot enable. For example, when training its V3 model, DeepSeek reconfigured Nvidia's H800 GPUs: out of 132 streaming multiprocessors, it allocated 20 for server-to-server communication, possibly for compressing and decompressing data to overcome connectivity limitations of the processor and speed up transactions.

Get the Android app

Or read this on r/technology

Read more on:

Photo of assembly

assembly

Photo of DeepSeek

DeepSeek

Photo of like ptx programming

like ptx programming

Related news:

News photo

Hugging Face researchers are trying to build a more open version of DeepSeek’s AI ‘reasoning’ model

News photo

Eric Schmidt says DeepSeek marks a ‘turning point’ for the global AI race

News photo

Is DeepSeek really able to do what they said or are they using sanctioned equipment and hiding it? And if so, is it possible that all those knuckleheads at the big AI companies missed such an obvious way to 10x their own tech?