Get the latest tech news
ThunderKittens: Simple, fast, and adorable AI kernels
Oct 29, 2024 · 11 min read Easier, Better, Faster, Cuter Benjamin Spector, Simran Arora, Aaryan Singhal, Daniel Y. Fu, Chris Ré Paper | Code | TK Part 1 | Brr Fiveish months ago, we put out our posts on ThunderKittens and GPUs, and were pleasantly surprised by their warm reception on The Platform Formerly Known as Twitter.
Fiveish months ago, we put out our posts on ThunderKittens and GPUs, and were pleasantly surprised by their warm reception on The Platform Formerly Known as Twitter. Following up on our recent LoLCATs work, we’ve included a forwards prefill kernel and a demo integration -- just run cd demos/lolcats_demo && bash demo_8b.sh. This also comes with a pretty significant performance boost, because we finally figured out how to get rid of the stupid interleave format, reduce bank conflicts, and improve L2 transaction coalescing.
Or read this on Hacker News