Get the latest tech news

Meta unleashes Llama API running 18x faster than OpenAI: Cerebras partnership delivers 2,600 tokens per second


Meta partners with Cerebras to launch its new Llama API, offering developers AI inference speeds up to 18 times faster than traditional GPU solutions, challenging OpenAI and Google in the fast-growing AI services market.

Meta announced today a partnership with Cerebras Systems to power its new Llama API, offering developers access to inference speeds up to 18 times faster than traditional GPU-based solutions. A benchmark chart shows Cerebras processing Llama 4 at 2,648 tokens per second, dramatically outpacing competitors SambaNova (747), Groq (600) and GPU-based services from Google and others — explaining Meta’s hardware choice for its new API. This speed advantage enables entirely new categories of applications that were previously impractical, including real-time agents, conversational low-latency voice systems, interactive code generation, and instant multi-step reasoning — all of which require chaining multiple large language model calls that can now be completed in seconds rather than minutes.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of OpenAI

OpenAI

Photo of Meta

Meta

Photo of Cerebras

Cerebras

Related news:

News photo

Meta has a plan to bring AI to WhatsApp chats without breaking privacy

News photo

OpenAI rolls back update that made ChatGPT ‘too sycophant-y’

News photo

Meta’s first dedicated AI app is here with Llama 4 — but it’s more consumer than productivity or business oriented