Get the latest tech news

I rebuilt FlashAttention in Triton to understand the performance archaeology


Reimplementing FlashAttention for performance and giggles Table of Contents ⏲️ Estimated reading time ~45min. Flash Attention: From Theory to Implementation Flash Attention has become one of the most impactful optimizations in modern deep learning.

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of flashattention

flashattention

Photo of triton

triton

Related news:

News photo

Gluon: a GPU programming language based on the same compiler stack as Triton

News photo

Accelerated Game of Life with CUDA / Triton

News photo

FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention