Get the latest tech news
What Shapes Do Matrix Multiplications Like?
Divining order from the chaos
Basically, tile quantization occurs when the size of your matrix multiplication increases such that the GPU needs to launch another “chunk” of work. Note that crucially, when tile quantization is the culprit, your absolute runtime still grows monotonically, although your efficiency may drop. Beyond the obvious matrix multiplication shape issues, performance loss due to wave quantization often ends up being tricky to find, since it depends upon things like the batch size as well.
Or read this on Hacker News