Get the latest tech news

Basic Facts about GPUs


Making sure I don’t forget what I read.

For a single, complex operation with high potential arithmetic intensity (like matrix multiplication), the strategy is to use tiling to maximize data reuse within the SM’s fast memory. To increase data reuse and become compute-bound, threads within a block must cooperate to load large tiles of the input matrices into the SM’s fast, on-chip Shared Memory. A kernel can be compute-bound but still be slow if its FLOPs are inefficient (e.g., using scalar FP32 math instead of specialized hardware like Tensor Cores) or if the GPU operates below its peak clock speed due to power limits.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of GPUs

GPUs

Photo of Basic Facts

Basic Facts

Related news:

News photo

Intel Mesa Drivers Now Properly Report Battlemage BMG-G31 GPUs

News photo

Nvidia is building the 'world's first' industrial AI cloud—German facility to leverage 10,000 GPUs, DGX B200, and RTX Pro servers

News photo

Over $90,000 worth of RTX 5090 GPUs have been replaced in-box by crossbody backpacks at just one Silicon Valley Micro Center