Get the latest tech news

Parallel Reduce and Scan on the GPU

Programming, GPU, Optimizations, Algorithms

They are basic building blocks for more complex algorithms, e.g. solving linear equations or stream compaction. Since workgroups can’t be synchronized between each other, we’ll need to use atomicAdd or simply run the entire algorith in multiple passes. Similarily to reduce, each invocation in the subgroup will receive the partial sum corresponding to its index (in increasing order).

Get the Android app

Or read this on Hacker News