Get the latest tech news

Using the Matrix Cores of AMD RDNA 4 architecture GPUs

This article is a quick how-to guide for using the WMMA intrinsics with our new AMD RDNA™ 4 architecture GPUs.

However, we changed the VGPR layout for the arguments of Wave Matrix Multiply Accumulate (WMMA) operations compared to the previous RDNA 3 generation [1]. A single lane of a wavefront simply allocates multiple 16x16 matrices in VGPR, loads them into the VGPRs, and then just executes the operation. The other point is that once the matrices are loaded into VGPRs, a WMMA intrinsic is called from all the lanes in a wavefront, which triggers the execution of the GEMM operation in the matrix core.

Get the Android app

Or read this on Hacker News