Get the latest tech news

Cerebras achieves 2,500T/s on Llama 4 Maverick (400B)


At over 2,500 t/s, Cerebras has set a world record for LLM inference speed on the 400B parameter Llama 4 Maverick model, the largest in the Llama 4 family.

“Cerebras has beaten the Llama 4 Maverick inference speed record set by NVIDIA last week,” said Micah Hill-Smith, Co-Founder and CEO of Artificial Analysis. Andrew Feldman, CEO of Cerebras Systems, said, “The most important AI applications being deployed in enterprise today—agents, code generation, and complex reasoning—are bottlenecked by inference latency. These use cases often involve multi-step chains of thought or large-scale retrieval and planning, with generation speeds as low as 100 tokens per second on GPUs, causing wait times of minutes and making production deployment impractical.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Cerebras

Cerebras

Photo of 400B

400B

Photo of Llama 4 Maverick

Llama 4 Maverick

Related news:

News photo

Apple’s US App Store topped $400B in developer billings and sales in 2024

News photo

Cerebras CEO actually finds common ground with Nvidia as startup notches IBM win

News photo

Meta unleashes Llama API running 18x faster than OpenAI: Cerebras partnership delivers 2,600 tokens per second