Get the latest tech news
GPU Prefix Sums: A nearly complete collection
A nearly complete collection of prefix sum algorithms implemented in CUDA, D3D12, Unity and WGPU. Theoretically portable to all wave/warp/subgroup sizes. - GitHub - b0nes164/GPUPrefixSums: A nearl...
GPUPrefixSums aims to bring state-of-the-art GPU prefix sum techniques from CUDA and make them available in portable compute shaders. In Decoupled Fallback, a threadblock will spin for a set amount of cycles while waiting for the reduction of a preceding partition tile. The prefix sum is one of the most important algorithmic primitives in parallel computing, underpinning everything from sorting, to compression, to graph traversal.
Or read this on Hacker News