Get the latest tech news

IndexCache, a new sparse attention optimizer, delivers 1.82x faster inference on long-context AI models


None

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of faster inference

faster inference

Photo of IndexCache

IndexCache

Photo of context AI models

context AI models

Related news:

News photo

Executing programs inside transformers with exponentially faster inference

News photo

Mixture-of-recursions delivers 2x faster inference—Here’s how to implement it

News photo

AMD announces MI350X and MI355X AI GPUs, claims up to 4X generational performance gain, 35X faster inference