Get the latest tech news

Reducing the cost of a single Google Cloud Dataflow Pipeline by Over 60%


In this article we’ll present methods for efficiently optimizing physical resources and fine-tuning the configuration of a Google Cloud Platform (GCP) Dataflow pipeline in order to achieve cost reductions. Optimization will be presented as a real-life scenario, which will be performed in stages.

In this article we’ll present methods for efficiently optimizing physical resources and fine-tuning the configuration of a Google Cloud Platform (GCP) Dataflow pipeline in order to achieve cost reductions. In addition, I’ve decided not to test all the possible combinations of machine families, disk types and configuration options to save time. ConfigurationProcessing cost for one day on a full datasetn2-standard-4 + HDD$350.02t2d-standard-8 + SSD + shuffle service turned off$134.14 (~ 62% less than original price)As we see, the predicted gain from subsampling was achieved, and savings are even 3 pp higher than estimated.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of cost

cost

Related news:

News photo

The true (and surprising) cost of forgotten passwords

News photo

Smart sleep is worth the cost

News photo

An energy efficient home - why is it so difficult?