Get the latest tech news
AdapTive-LeArning Speculator System (ATLAS): Faster LLM inference
LLM inference that gets faster as you use it. Our runtime-learning accelerator adapts continuously to your workload, delivering 500 TPS on DeepSeek-V3.1, a 4x speedup over baseline performance without manual tuning.
None
Or read this on Hacker News