Get the latest tech news

ThunderKittens: Simple, fast, and adorable AI kernels


Oct 29, 2024 · 11 min read Easier, Better, Faster, Cuter Benjamin Spector, Simran Arora, Aaryan Singhal, Daniel Y. Fu, Chris Ré Paper | Code | TK Part 1 | Brr Fiveish months ago, we put out our posts on ThunderKittens and GPUs, and were pleasantly surprised by their warm reception on The Platform Formerly Known as Twitter.

Fiveish months ago, we put out our posts on ThunderKittens and GPUs, and were pleasantly surprised by their warm reception on The Platform Formerly Known as Twitter. Following up on our recent LoLCATs work, we’ve included a forwards prefill kernel and a demo integration -- just run cd demos/lolcats_demo && bash demo_8b.sh. This also comes with a pretty significant performance boost, because we finally figured out how to get rid of the stupid interleave format, reduce bank conflicts, and improve L2 transaction coalescing.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of kernels

kernels

Photo of thunderkittens

thunderkittens

Related news:

News photo

AMD EPYC 9755 Performance On The Linux 6.11 & Linux 6.12 Kernels