inference speedups

Read news on inference speedups with our app.

Read more in the app

Researchers baked 3x inference speedups directly into LLM weights — without speculative decoding