Get the latest tech news
Parallel Reduce and Scan on the GPU
Programming, GPU, Optimizations, Algorithms
They are basic building blocks for more complex algorithms, e.g. solving linear equations or stream compaction. Since workgroups can’t be synchronized between each other, we’ll need to use atomicAdd or simply run the entire algorith in multiple passes. Similarily to reduce, each invocation in the subgroup will receive the partial sum corresponding to its index (in increasing order).
Or read this on Hacker News