Get the latest tech news

GPUs Go Brrr


how make gpu fast?

But as HBM has gotten faster and the tensor cores continue to grow out of proportion with the rest of the chip, even relatively small latencies like those from shared memory have also become important to either remove or hide. Motivated by a continuing proliferation of new architectures within the lab (and the fact that Flash Attention is like 1200 lines of code), we ended up designing a DSL embedded within CUDA -- at first for our own internal use. To show an example of these primitives in action, consider Tri’s lovely flash attention -- a beautiful algorithm, but complicated to implement in practice, even on top of NVIDIA’s wonderful Cutlass library.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of GPUs

GPUs

Photo of Brrr

Brrr

Related news:

News photo

Alternative Clouds Are Booming As Companies Seek Cheaper Access To GPUs

News photo

“Meta spent almost as much as the Manhattan Project on GPUs in today's dollars”

News photo

Alternative clouds are booming as companies seek cheaper access to GPUs