Get the latest tech news

Cerebras launches inference for Llama 3.1; benchmarked at 1846 tokens/s on 8B

x.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Cerebras

Cerebras

Photo of inference

inference

Photo of tokens

tokens

Related news:

Cerebras Inference: AI at Instant Speed

Cerebras gives waferscale chips inferencing twist, claims 1,800 token per sec generation rates

Google Cloud Run embraces Nvidia GPUs for serverless AI inference

« California woman fed up with stolen mail sends Apple AirTag to herself to catch thief

Trump’s Latest NFT Sale Depicts Him as Superhero, Motorcyclist »