Get the latest tech news

The best way to use text embeddings portably is with Parquet and Polars

Never store embeddings in a CSV!

Ideally, we need a library that can handle nested data easily and can interoperate with numpy for serializing to a matrix and can run fast dot products. But pyarrow is not a DataFrame library, and despite the data being in a Table, it’s hard to slice and access: the documentation suggests that you export to pandas if you need more advanced manipulation. For many applications, the combination of Parquet files and polars provides everything you need: efficient storage, fast similarity search, and easy metadata filtering.

Get the Android app

Or read this on Hacker News