Get the latest tech news

How Taalas “prints” LLM onto a chip?

A startup called Taalas, recently released an ASIC chip running Llama 3.1 8B (3/6 bit quant) at an inference rate of 17,000 tokens per seconds. That's like writing around 30 A4 sized pages in one second.

None

Get the Android app

Or read this on Hacker News

Related news:

Happy Zelda's 40th first LLM running on N64 hardware (4MB RAM, 93MHz)

Claws are now a new layer on top of LLM agents

EFF policy says bots can code but humans must write the docs

« Tech giants commit billions to Indian AI as New Delhi pushes for superpower status

Carelessness versus craftsmanship in cryptography »