Get the latest tech news
Garbage collection of object storage at scale
Distributed systems built on object storage all have one common problem: removing files that have been logically deleted either due to data expiry or compaction. We review the pros and cons of five ways to solve this problem.
One consistent factor across all of these systems is how much time we spent solving what seems like a relatively straightforward problem: removing files from object storage that had been logically deleted either due to data expiry or compaction. The WarpStream Agent will query the metadata store to find which file contains the batch of data that starts at offset 300 for partition 2 of the logs topic. Our BYOC deployment model meant that if we ever orphaned files in customer object storage buckets, we would have to involve them somehow to clean it up, which didn’t feel acceptable to us.
Or read this on Hacker News