Get the latest tech news
RDNA 4's “Out-of-Order” Memory Accesses
Examining RDNA 4's out-of-order memory accesses in detail, and investigating with testing
I’m surprised by the results, but it’s convincing evidence that AMD indeed had false cross-wave memory delays on RDNA 3 and older GPU architectures. RDNA 3 and prior AMD GPUs possibly had a shared memory access queue, with each entry tagged with its wave's ID. While the implementation details differ, RDNA 4, GCN, and Intel and Nvidia's GPUs can all absorb cache misses without immediately stalling a thread.
Or read this on Hacker News