Get the latest tech news
Speeding up PyTorch inference on Apple devices with AI-generated Metal kernels
vices with AI-generated Metal kernels - Published on - Authors - Name - Taras Sereda - Name - Natalie Serrino - Name - Zain Asgar Speeding up PyTorch inference by 87% on Apple devices with AI-generated Metal kernels tl;dr: Our lab investigated whether frontier models can write optimized GPU kernels for Apple devices to speed up inference. We found that they can: our AI-generated Metal kernels were 1.87x faster across 215 PyTorch modules, with some workloads running hundreds of times faster than baseline.
tl;dr: Our lab investigated whether frontier models can write optimized GPU kernels for Apple devices to speed up inference. When evaluating the agent-generated kernels, we need to assess both correctness and performance relative to the baseline PyTorch implementation (at the time of writing, torch.compile support for Metal is still underway, so it could not serve as a comparison point. These results show that it's possible to automatically drive significant improvements to model performance by automating the kernel optimization without any user code changes, new frameworks, or porting.
Or read this on Hacker News