Get the latest tech news

The best way to use text embeddings portably is with Parquet and Polars


Never store embeddings in a CSV!

Ideally, we need a library that can handle nested data easily and can interoperate with numpy for serializing to a matrix and can run fast dot products. But pyarrow is not a DataFrame library, and despite the data being in a Table, it’s hard to slice and access: the documentation suggests that you export to pandas if you need more advanced manipulation. For many applications, the combination of Parquet files and polars provides everything you need: efficient storage, fast similarity search, and easy metadata filtering.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of best way

best way

Photo of Parquet

Parquet

Photo of Polars

Polars

Related news:

News photo

Fact Checks ‘Best Way’ To Counter Health Misinformation, Say Americans

News photo

Modern Polars – A side-by-side comparison of the Polars and Pandas libraries

News photo

Ask Slashdot: What's the Best Way to Transfer Legacy PHP Code to a Modern Framework?