Get the latest tech news

Garbage collection of object storage at scale


Distributed systems built on object storage all have one common problem: removing files that have been logically deleted either due to data expiry or compaction. We review the pros and cons of five ways to solve this problem.

One consistent factor across all of these systems is how much time we spent solving what seems like a relatively straightforward problem: removing files from object storage that had been logically deleted either due to data expiry or compaction. The WarpStream Agent will query the metadata store to find which file contains the batch of data that starts at offset 300 for partition 2 of the logs topic. Our BYOC deployment model meant that if we ever orphaned files in customer object storage buckets, we would have to involve them somehow to clean it up, which didn’t feel acceptable to us.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of scale

scale

Photo of garbage collection

garbage collection

Photo of object storage

object storage

Related news:

News photo

UIT – performant, modular, low-memory file processing at scale, in the Cloud

News photo

SCALE 1.3 Adds BFloat16 & Other New Features For Compiling CUDA Apps On AMD GPUs

News photo

Electric trains in California cut 89% of toxic air pollution, study surprises | What made this transition unique was not just its scale, but its speed—and the immediate impact it had on air quality.