Get the latest tech news
The Economics of Speculative Decoding
Two underexplored axes: what MoE routing does to the decode roofline, and how compressed attention takes away the slack that used to make speculated tokens free.
None
Or read this on Hacker News