Get the latest tech news

Show HN: KVBoost – chunk-level KV cache reuse for HuggingFace, 5–48x faster TTFT


pip install kvboost KVBoost Faster LLM Inference. Less VRAM.

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of HuggingFace

HuggingFace

Photo of KVBoost

KVBoost

Photo of level KV cache reuse

level KV cache reuse

Related news:

News photo

Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUs

News photo

Hugging Face Skills

News photo

Open source inference time compute example from HuggingFace