Get the latest tech news
Amazon's exabyte-scale migration from Apache Spark to Ray on EC2
Large-scale, distributed compute framework migrations are not for the faint of heart. There are backwards-compatibility constraints to maintain, performance expectations to meet, scalability limits to overcome, and the omnipresent risk of introducing breaking changes to production. This all becomes especially troubling if you happen to be migrating away from something that successfully processes exabytes of […]
They just flipped the switch to start quietly moving management of some of their largest production business intelligence (BI) datasets from Apache Spark over to Ray to help reduce both data processing time and cost. There were many factors that contributed to these results, including Ray’s ability to reduce task orchestration and garbage collection overhead, leverage zero-copy intranode object exchange during locality-aware shuffles, and better utilize cluster resources through fine-grained autoscaling. The purpose of this step was to get more direct comparisons with Apache Spark, verify that Ray’s benefits held up across a wide variety of notoriously problematic tables, and to smoke out any latent corner-case issues that only appeared at scale.
Or read this on Hacker News