Get the latest tech news

Cerebras Inference now 3x faster: Llama3.1-70B breaks 2,100 tokens/s


Today we’re announcing the biggest update to Cerebras Inference since launch. Cerebras Inference now runs Llama 3.1-70B at an astounding 2,100 tokens per second – a 3x performance boost over the prior release.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of tokens

tokens

Photo of Cerebras Inference

Cerebras Inference

Related news:

News photo

Trump Crypto Project Website Crashes as Its Tokens Go on Sale

News photo

Llama 405B 506 tokens/second on an H200

News photo

The Role of Anchor Tokens in Self-Attention Networks