Get the latest tech news
I rebuilt FlashAttention in Triton to understand the performance archaeology
Reimplementing FlashAttention for performance and giggles Table of Contents ⏲️ Estimated reading time ~45min. Flash Attention: From Theory to Implementation Flash Attention has become one of the most impactful optimizations in modern deep learning.
None
Or read this on Hacker News
