Get the latest tech news
Google Cloud Run embraces Nvidia GPUs for serverless AI inference
-No longer do organizations need to pay for long running servers for AI inference as Google Cloud Run previews support for Nvidia L4 GPUs.
Real-time inference with lightweight open models such as Gemma 2B/7B or Llama3 (8B), enabling the creation of responsive custom chatbots and on-the-fly document summarization tools. According to Google, cold start times range from 11 to 35 seconds for various models, including Gemma 2b, Gemma2 9b, Llama2 7b/13b, and Llama 3.1 8b, showcasing the platform’s responsiveness. Each Cloud Run instance can be equipped with one Nvidia L4 GPU, with up to 24GB of vRAM, providing a solid level of resources for many common AI inference tasks.
Or read this on Venture Beat