Get the latest tech news
Show HN: Vortex – a high-performance columnar file format
A toolkit for working with compressed Arrow in-memory, on-disk, and over-the-wire. "The LLVM of file formats" - spiraldb/vortex
Some key features are not yet implemented, both the API and the serialized format are likely to change in breaking ways, and we cannot yet guarantee correctness in all cases. This sounds like it would be very expensive, but given basic statistics about a chunk, it is possible to cheaply prune many encodings and ensure the search space does not explode in size. Unlike other array libraries, these statistics can be populated from disk formats such as Parquet and preserved all the way into a compute engine.
Or read this on Hacker News