Get the latest tech news
Iceberg, the right idea – the wrong spec – Part 2 of 2: The spec
Let us finally look at what is so wrong with the Iceberg spec and why this simply isn't a serious attempt at solving the metadata problem of large Data Lakes. In the first part of this I took..."]
Fragmentation of Space Granular Concurrency control Atomicity across multiple objects The impedance mismatch between Row and Files Low latency, high throughput access to metadata The atomic swap needed to commit new versions of table metadata can be implemented by storing a pointer in a metastore or database that is updated with a check-and-put operation Uses O(n) operations to add new metadata to an existing table - where it should have used O(1) or O(log(n)) Cannot handle cross table commits Relies on file formats that are bloated and ineffective Tries to be a "file only" format, but still needs a database to operate Does not handle fragmentation and metadata bloat and remains silent about the real complexities of that problem Does not handle row level security or security at all for that matter Fundamentally does not scale - because it uses a bad, optimistic concurrency model Is entirely unfit for trickle feeding of data - a hallmark feature of large Data Lakes Moves an extraordinary amount of complexity (and trust) to the client talking to it Makes proper caching and query planning (the hallmarks of good analytics) very difficult, if not impossible Has all the hallmarks of something being designed by committee, completely lacking elegance
Or read this on Hacker News