Get the latest tech news
Long Convolutions via Polynomial Multiplication
A tutorial on long convolutions for GPT-like models.
We’ve been writing a series of papers ( 1, 2, 3) that have at their core so-called long convolutions, with an aim towards enabling longer-context models. We worked hard to fuse this algorithm and make it efficient on modern hardware – check out FlashFFTConv for a primer there. The nice thing is – once we’ve made this connection, we can bring the tools of polynomial theory to bear on understanding these ML layers:
Or read this on Hacker News