Get the latest tech news
Model Flop Utilization Beyond 6ND
# MFU is Poorly Approximating Billions of Dollars in Compute by [@bwasti](https://x.com/bwasti) --- Model FLOPs Utilization (MFU) is ***the*** efficiency metric these days. It captures the utilization of GPUs either during training or inference.
***Parallllellism*** Besides always breaking all the time, training and serving on thousands of GPUs is a fantastically annoying thing to do because there are *lots* of types of parallelism to manage. [](https://i.imgur.com/vLSY2uC.png) With these graphs you can iterate through nodes that are well equipped with the information needed to derive FLOPs and other goodies like bytes moved. With this approach we determine which operations *shouldn't* be maximizing flop utilization and come to more realistic and insightful conclusions about performance gaps.
Or read this on Hacker News