Get the latest tech news

Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference


Llama 3.1 405B on Cerebras is by far the fastest frontier model in the world – 12x faster than GPT-4o and 18x faster than Claude 3.5 Sonnet.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of tokens

tokens

Photo of Cerebras Inference

Cerebras Inference

Photo of Llama 3.1 405B

Llama 3.1 405B

Related news:

News photo

Cerebras Inference now 3x faster: Llama3.1-70B breaks 2,100 tokens/s

News photo

Trump Crypto Project Website Crashes as Its Tokens Go on Sale

News photo

Llama 405B 506 tokens/second on an H200