Get the latest tech news

I spent 5 hours learning how ClickHouse built their internal data warehouse


19 data sources and a total of 470 TB of compressed data.

Built to handle large-scale data, it excels in OLAP scenarios, delivering rapid query execution even on massive datasets. Because Airflow jobs/DAGs can retry multiple times for the same data interval, using ReplicatedReplacingMergeTree makes the pipeline idempotent, allowing safe re-execution without duplicates. However, this approach became unsustainable as they added more data sources, developed complex business metrics, and served an increasing number of internal stakeholders.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of hours

hours

Photo of ClickHouse

ClickHouse

Related news:

News photo

Last hours to snag up to $600 off TechCrunch Disrupt 2024 passes

News photo

The new Alienware Pro Headset has graphene-coated drivers and lasts up to 75 hours

News photo

PSA: Hours-long streams of Silent Hill 2 Remake and Zelda: Echoes of Wisdom are popping up online