Get the latest tech news
Reducing the cost of a single Google Cloud Dataflow Pipeline by Over 60%
In this article we’ll present methods for efficiently optimizing physical resources and fine-tuning the configuration of a Google Cloud Platform (GCP) Dataflow pipeline in order to achieve cost reductions. Optimization will be presented as a real-life scenario, which will be performed in stages.
In this article we’ll present methods for efficiently optimizing physical resources and fine-tuning the configuration of a Google Cloud Platform (GCP) Dataflow pipeline in order to achieve cost reductions. In addition, I’ve decided not to test all the possible combinations of machine families, disk types and configuration options to save time. ConfigurationProcessing cost for one day on a full datasetn2-standard-4 + HDD$350.02t2d-standard-8 + SSD + shuffle service turned off$134.14 (~ 62% less than original price)As we see, the predicted gain from subsampling was achieved, and savings are even 3 pp higher than estimated.
Or read this on Hacker News