Get the latest tech news
Optimizing AI Inference at Character.ai
At Character.AI, we're building toward AGI. In that future state, large language models (LLMs) will enhance daily life, providing business productivity and entertainment and helping people with everything from education to coaching, support, brainstorming, creative writing and more. To make that a reality globally, it's critical to achieve highly
In that future state, large language models (LLMs) will enhance daily life, providing business productivity and entertainment and helping people with everything from education to coaching, support, brainstorming, creative writing and more. To put this in perspective, this is roughly 20% of the request volume served by Google Search, which processes around 105,000 queries per second according to third party estimates ( Statista, 2024). Taken together, the innovations discussed above achieve unprecedented efficiency and reduce inference costs to a level that makes it far easier to serve LLMs at scale.
Or read this on Hacker News