Get the latest tech news
CUDA Moat Still Alive
Intro SemiAnalysis has been on a five-month long quest to settle the reality of MI300X. In theory, the MI300X should be at a huge advantage over Nvidia’s H100 and H200 in terms of specifications an…
We ran unofficial MLPerf Training GPT-3 175B on 256 H100 in collaboration with Sustainable Metal Cloud to test the effects of different VBoost setting For AMD, Real World Performance on public stable released software is nowhere close to its on paper marketed TFLOP/s. AMD customers tend to use hand crafted kernels only for inference, which means their performance outside of very narrow well defined use cases is poor, and their flexibility to rapidly shifting workloads is non-existent. This docker image requires ~5 hours to build from source and installs dependencies and sub-dependencies (hipBLASLt, Triton, PyTorch, TransformerEngine), a huge difference compared to Nvidia, which offers a pre-built, out of the box experience and takes but a single line of code.
Or read this on Hacker News