Get the latest tech news

CUDA Moat Still Alive


Intro SemiAnalysis has been on a five-month long quest to settle the reality of MI300X. In theory, the MI300X should be at a huge advantage over Nvidia’s H100 and H200 in terms of specifications an…

We ran unofficial MLPerf Training GPT-3 175B on 256 H100 in collaboration with Sustainable Metal Cloud to test the effects of different VBoost setting For AMD, Real World Performance on public stable released software is nowhere close to its on paper marketed TFLOP/s. AMD customers tend to use hand crafted kernels only for inference, which means their performance outside of very narrow well defined use cases is poor, and their flexibility to rapidly shifting workloads is non-existent. This docker image requires ~5 hours to build from source and installs dependencies and sub-dependencies (hipBLASLt, Triton, PyTorch, TransformerEngine), a huge difference compared to Nvidia, which offers a pre-built, out of the box experience and takes but a single line of code.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of training

training

Photo of H200

H200

Photo of CUDA moat

CUDA moat

Related news:

News photo

Just how deep is Nvidia's CUDA moat really?

News photo

Ilya Sutskever Test of Time Talk "Pre-training as we know it will end"

News photo

How to stop the AI you’re using from training with your data