Get the latest tech news

Initial CUDA Performance Lessons


I am somehow very late to learning CUDA. I didn’t even know until recently that CUDA is just C++ with a small amount of extra stuff. If I had known that there is so little friction to learnin…

Many years ago Sean Parent presented a graph that breaks down where the performance is in a modern PC. Meaning it’s not exactly a fair comparison because it operates on lower precision (the output is in 32 bits though) but everyone uses this for matrix multiplies and thinks it’s fine. When I ran Nsight Compute on the first couple versions of my code, the feedback that came back could always be summarized as “you’re barely using the GPU, make it more parallel.”

Get the Android app

Or read this on Hacker News