Get the latest tech news

Speeding up PyTorch inference on Apple devices with AI-generated Metal kernels


vices with AI-generated Metal kernels - Published on - Authors - Name - Taras Sereda - Name - Natalie Serrino - Name - Zain Asgar Speeding up PyTorch inference by 87% on Apple devices with AI-generated Metal kernels tl;dr: Our lab investigated whether frontier models can write optimized GPU kernels for Apple devices to speed up inference. We found that they can: our AI-generated Metal kernels were 1.87x faster across 215 PyTorch modules, with some workloads running hundreds of times faster than baseline.

tl;dr: Our lab investigated whether frontier models can write optimized GPU kernels for Apple devices to speed up inference. When evaluating the agent-generated kernels, we need to assess both correctness and performance relative to the baseline PyTorch implementation (at the time of writing, torch.compile support for Metal is still underway, so it could not serve as a comparison point. These results show that it's possible to automatically drive significant improvements to model performance by automating the kernel optimization without any user code changes, new frameworks, or porting.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Apple

Apple

Photo of PyTorch inference

PyTorch inference

Related news:

News photo

Garmin Beats Apple to Market with Satellite-Connected Smartwatch

News photo

Apple's new iOS 26 public beta 6 is out now: Is your iPhone eligible for the update? Check this list

News photo

How to watch Apple announce the iPhone 17