Get the latest tech news
Buckets of Parquet Files Are Awful
From the "not my job" department
Nowadays the most common way of sharing data seems to be "dump a terabyte of parquet in an S3 bucket and call it a day." I do not care about data pipelines, airflow, spark, row groups, arrow, kafka, push-down filtering, or any of these other distractions from building. If you have had a ton of data dropped on you (like a billion dollars worth of pennies), I will personally help you figure out how to turn it int something usable.
Or read this on Hacker News