Get the latest tech news

We bought the whole GPU, so we're damn well going to use the whole GPU

p 28, 2025 · 24 min read We Bought the Whole GPU, So We're Damn Well Going to Use the Whole GPU Benjamin Spector*, Jordan Juravsky*, Stuart Sul*, Dylan Lim, Owen Dugan, Simran Arora, Chris Ré Intro Post | Code | Low-Latency Megakernels | Brr TLDR: We're releasing a throughput-optimized megakernel for tensor-parallel inference with Llama-70B on H100s. Our kernel can aggressively overlap compute, memory, and communication ops in order to simultaneously use the different hardware resources available on a GPU.

None

Get the Android app

Or read this on Hacker News