Get the latest tech news
Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference
Llama 3.1 405B on Cerebras is by far the fastest frontier model in the world – 12x faster than GPT-4o and 18x faster than Claude 3.5 Sonnet.
Or read this on Hacker News