Get the latest tech news

Alibaba Cloud says it cut Nvidia AI GPU use by 82% with new pooling system


A paper presented at SOSP 2025 details how token-level scheduling helped one GPU serve multiple LLMs, reducing demand from 1,192 to 213 H20s.

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Alibaba Cloud

Alibaba Cloud

Photo of Nvidia AI GPU use

Nvidia AI GPU use

Photo of new pooling system

new pooling system

Related news:

News photo

Alibaba Cloud claims to reduce Nvidia GPU use by 82%

News photo

Alibaba Cloud: AI Models, Reducing Footprint of Nvidia GPUs, and Cloud Streaming

News photo

Alibaba Cloud plans expansion into Europe and South America