Get the latest tech news

The Birth of Parquet


By Julien Le Dem

On the other end, there was Vertica, a state-of-the-art Massively Parallel Processing database, leveraging columnar storage and vectorization to achieve low latency results. One of the benefits is that when you need to retrieve only a subset of the columns, which is very common, you can much more efficiently scan them from the disk in big chunks rather than doing a lot of small seeks. I took inspiration from the existing formats I could find (TFile, RCFile, CIF, Trevni) and the context of schema definition at Twitter (Thrift and Pig) and, over the summer of 2012, I started implementing the column spliting algorithm described in the Dremel paper.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of birth

birth

Photo of Parquet

Parquet

Related news:

News photo

Parquet-WASM: Rust-based WebAssembly bindings to read and write Parquet data

News photo

The birth of a salesman

News photo

Nikon made an AI imaging camera that detects when cows are about to give birth