Get the latest tech news
Alibaba Cloud says it cut Nvidia AI GPU use by 82% with new pooling system
A paper presented at SOSP 2025 details how token-level scheduling helped one GPU serve multiple LLMs, reducing demand from 1,192 to 213 H20s.
None
Or read this on Hacker News
