Get the latest tech news

Cerebras Inference now 3x faster: Llama3.1-70B breaks 2,100 tokens/s

Today we’re announcing the biggest update to Cerebras Inference since launch. Cerebras Inference now runs Llama 3.1-70B at an astounding 2,100 tokens per second – a 3x performance boost over the prior release.

Get the Android app

Or read this on Hacker News

Related news:

Trump Crypto Project Website Crashes as Its Tokens Go on Sale

Llama 405B 506 tokens/second on an H200

The Role of Anchor Tokens in Self-Attention Networks

« The brain's waste clearing lymphatic system shown in people for first time

Zombie servers: the silent killers of Australia's cloud budgets and security »