Get the latest tech news

Cerebras launches inference for Llama 3.1; benchmarked at 1846 tokens/s on 8B


x.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Cerebras

Cerebras

Photo of inference

inference

Photo of tokens

tokens

Related news:

News photo

Cerebras Inference: AI at Instant Speed

News photo

Cerebras gives waferscale chips inferencing twist, claims 1,800 token per sec generation rates

News photo

Google Cloud Run embraces Nvidia GPUs for serverless AI inference