Read news on inference speedups with our app.
Read more in the app
Researchers baked 3x inference speedups directly into LLM weights — without speculative decoding