Get the latest tech news
The two versions of Parquet
The adoption of Parquet version 2 is limited due to the lack of support in the ecosystem, which affects its evolution despite its improvements in compression and performance.
Ideally, the standard would be whatever the specification defines, but in reality, there is no agreement on the minimum set of features an implementation must support to be considered compatible with version 2. In this Pull Request from the project that describes the file format, there has been an ongoing discussion for four years about what constitutes the core, and there are no signs of a resolution anytime soon. The DuckDB article prompted me to investigate the performance implications of Parquet Version 2, which I hadn’t considered in [my previous post on compression algorithms]](/compression-algorithms-parquet/).
Or read this on Hacker News