Get the latest tech news

Buckets of Parquet Files Are Awful


From the "not my job" department

Nowadays the most common way of sharing data seems to be "dump a terabyte of parquet in an S3 bucket and call it a day." I do not care about data pipelines, airflow, spark, row groups, arrow, kafka, push-down filtering, or any of these other distractions from building. If you have had a ton of data dropped on you (like a billion dollars worth of pennies), I will personally help you figure out how to turn it int something usable.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Buckets

Buckets

Photo of parquet files

parquet files

Related news:

News photo

Five of the Best: Buckets