Get the latest tech news
Cerebras Inference now 3x faster: Llama3.1-70B breaks 2,100 tokens/s
Today we’re announcing the biggest update to Cerebras Inference since launch. Cerebras Inference now runs Llama 3.1-70B at an astounding 2,100 tokens per second – a 3x performance boost over the prior release.
Or read this on Hacker News