Get the latest tech news
Faster Inverse BWT (2021)
The BWT (Burrows Wheeler Transform) has long fascinated people for its ability to capture complex correlations with a very simple inverse t...
The forward transform also has interesting efficiency issues, mainly also relating to cache misses for large buffers or to GPU implementations, and there has been lots of good research on that; see the papers. In this blog I will assume you don't care much about forward transform speed, so I'll push all the slow operations to the encoder side to make the decoder as fast as possible. Independent multiple streams allow you to keep more execution units busy, especially in a case like this where there is a fundamental cache miss stall that you want to fill with useful CPU work during those cycles.
Or read this on Hacker News