Get the latest tech news
Highly efficient matrix transpose in Mojo
In this blogpost I will step by step show you how to implement a highly efficient transpose kernel for the architecture using Mojo. The best kernel archive...
After allocating the shared memories we define the upper left coordinate of the tile using x and y and get row and column the current thread is responsible fore. For a more detailed explanation of what swizzling is and how it works please in my previous blogpost on matrix transpose the concept is the same for Mojo. I hope this blogpost showed you how to archive high performance on a common task in GPU computing using Mojo.
Or read this on Hacker News