Get the latest tech news

Cerebras Launches the Fastest AI Inference


20X performance and 1/5th the price of GPUs- available today. Developers can now leverage the power of wafer-scale compute for AI inference via a simple API.

“Artificial Analysis has verified that Llama 3.1 8B and 70B on Cerebras Inference achieve quality evaluation results in line with native 16-bit precision per Meta’s official versions. With speeds that push the performance frontier and competitive pricing, Cerebras Inference is particularly compelling for developers of AI applications with real-time or high volume requirements,” Hill-Smith concluded. Cerebras is proud to collaborate with these industry leaders, including Docker, Nasdaq, LangChain, LlamaIndex, Weights & Biases, Weaviate, AgentOps, and Log10 to drive the future of AI forward.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Cerebras

Cerebras

Photo of inference

inference

Related news:

News photo

Cerebras launches inference for Llama 3.1; benchmarked at 1846 tokens/s on 8B

News photo

Cerebras Inference: AI at Instant Speed

News photo

Cerebras gives waferscale chips inferencing twist, claims 1,800 token per sec generation rates