Get the latest tech news
Speculative KV coding: losslessly compressing KV cache by up to ~4×
Lossless compression of a target model's KV cache by up to 4×, using a cheaper predictor model to drive an arithmetic coder.
None
Or read this on Hacker News

