Get the latest tech news
How Taalas “prints” LLM onto a chip?
A startup called Taalas, recently released an ASIC chip running Llama 3.1 8B (3/6 bit quant) at an inference rate of 17,000 tokens per seconds. That's like writing around 30 A4 sized pages in one second.
None
Or read this on Hacker News