Get the latest tech news

Google Cloud Run embraces Nvidia GPUs for serverless AI inference


-No longer do organizations need to pay for long running servers for AI inference as Google Cloud Run previews support for Nvidia L4 GPUs.

Real-time inference with lightweight open models such as Gemma 2B/7B or Llama3 (8B), enabling the creation of responsive custom chatbots and on-the-fly document summarization tools. According to Google, cold start times range from 11 to 35 seconds for various models, including Gemma 2b, Gemma2 9b, Llama2 7b/13b, and Llama 3.1 8b, showcasing the platform’s responsiveness. Each Cloud Run instance can be equipped with one Nvidia L4 GPU, with up to 24GB of vRAM, providing a solid level of resources for many common AI inference tasks.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of Google

Google

Photo of inference

inference

Photo of Nvidia GPUs

Nvidia GPUs

Related news:

News photo

Google denies reports that it’s discontinuing Fitbit products

News photo

UK’s competition authority ends probes of Apple and Google but will use incoming powers to ‘resolve app store concerns’

News photo

UK Closes Google, Apple Probes Ahead of New Digital Rules Regime Rollout