Get the latest tech news

How fast is N tokens per second really?

Every local-LLM benchmark reports throughput: "47 tok/s on an M3," "180 tok/s on a 4090," "500 tok/s on Groq." Unless you've actually watched tokens stream at those rates, the numbers are hard to internalize. This is the rendering.

None

Get the Android app

Or read this on Hacker News