Get the latest tech news
Using the Matrix Cores of AMD RDNA 4 architecture GPUs
This article is a quick how-to guide for using the WMMA intrinsics with our new AMD RDNA™ 4 architecture GPUs.
However, we changed the VGPR layout for the arguments of Wave Matrix Multiply Accumulate (WMMA) operations compared to the previous RDNA 3 generation [1]. A single lane of a wavefront simply allocates multiple 16x16 matrices in VGPR, loads them into the VGPRs, and then just executes the operation. The other point is that once the matrices are loaded into VGPRs, a WMMA intrinsic is called from all the lanes in a wavefront, which triggers the execution of the GEMM operation in the matrix core.
Or read this on Hacker News