Get the latest tech news
Google Cloud Rapid Storage
Announcing new optimized hardware options, plus software and consumption model updates.
AI workloads have new and unique demands — addressing these requires a finely crafted combination of hardware and software for performance and efficiency at scale, and the ease-of-use and flexibility to access this infrastructure however it's needed. Longer and highly variable context windows are resulting in more sophisticated interactions; reasoning and multi-step inferencing is shifting the incremental demand for compute — and therefore cost — from training to inference time (test-time scaling). Both features are available in preview today, together reducing serving costs by over 30%, tail latency by 60%, and increasing throughput by up to 40% compared to other managed and open-source Kubernetes offerings.
Or read this on Hacker News