Get the latest tech news
Cerebras achieves 2,500T/s on Llama 4 Maverick (400B)
At over 2,500 t/s, Cerebras has set a world record for LLM inference speed on the 400B parameter Llama 4 Maverick model, the largest in the Llama 4 family.
“Cerebras has beaten the Llama 4 Maverick inference speed record set by NVIDIA last week,” said Micah Hill-Smith, Co-Founder and CEO of Artificial Analysis. Andrew Feldman, CEO of Cerebras Systems, said, “The most important AI applications being deployed in enterprise today—agents, code generation, and complex reasoning—are bottlenecked by inference latency. These use cases often involve multi-step chains of thought or large-scale retrieval and planning, with generation speeds as low as 100 tokens per second on GPUs, causing wait times of minutes and making production deployment impractical.
Or read this on Hacker News