Get the latest tech news

Lossless LLM 3x Throughput Increase by LMCache

Redis for LLMs. Contribute to LMCache/LMCache development by creating an account on GitHub.

LMCache is an LLM serving engine extension to reduce TTFT and increase throughput, especially under long-context scenarios. By storing the KV caches of reusable texts across various locations, including (GPU, CPU DRAM, Local Disk), LMCache reuses the KV caches of any reused text (not necessarily prefix) in any serving engine instance. Thus, LMCache saves precious GPU cycles and reduces user response delay.

Get the Android app

Or read this on Hacker News