Get the latest tech news

Google launches ‘implicit caching’ to make accessing its latest AI models cheaper

Google is rolling out a feature in its Gemini API, implicit caching, that the company claims will make its latest AI models cheaper for third-party devs.

Google is rolling out a feature in its Gemini API that the company claims will make its latest AI models cheaper for third-party developers. Caching, a widely adopted practice in the AI industry, reuses frequently accessed or pre-computed data from models to cut down on computing requirements and cost. The minimum prompt token count for implicit caching is 1,024 for 2.5 Flash and 2,048 for 2.5 Pro, according to Google’s developer documentation, which is not a terribly big amount, meaning it shouldn’t take much to trigger these automatic savings.

Get the Android app

Or read this on TechCrunch