Get the latest tech news

How Taalas “prints” LLM onto a chip?


A startup called Taalas, recently released an ASIC chip running Llama 3.1 8B (3/6 bit quant) at an inference rate of 17,000 tokens per seconds. That's like writing around 30 A4 sized pages in one second.

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of LLM

LLM

Photo of chip

chip

Photo of prints

prints

Related news:

News photo

Happy Zelda's 40th first LLM running on N64 hardware (4MB RAM, 93MHz)

News photo

Claws are now a new layer on top of LLM agents

News photo

EFF policy says bots can code but humans must write the docs