Get the latest tech news
Lessons learned operating petabyte-scale ClickHouse clusters: Part II
This is the second part of the series. Here's more of what I've learned from operating petabyte-scale ClickHouse clusters for the last 5+ years.
But that's expensive because you need to allocate hardware for both of them, and while real-time traffic is usually predictable, one-off queries just "happen", and you can't plan ahead. We do a multi-step process at Tinybird to shut a replica down that takes care of all of this (first remove traffic, then wait for the inserts and queries, kill them if needed). If a replica is down, ON CLUSTER operations will take a lot of time (depending on the config) and may generate issues in your app if timeouts are not properly set (and trust me, they're not).
Or read this on Hacker News