Get the latest tech news
The best way to use text embeddings portably is with Parquet and Polars
Never store embeddings in a CSV!
Ideally, we need a library that can handle nested data easily and can interoperate with numpy for serializing to a matrix and can run fast dot products. But pyarrow is not a DataFrame library, and despite the data being in a Table, it’s hard to slice and access: the documentation suggests that you export to pandas if you need more advanced manipulation. For many applications, the combination of Parquet files and polars provides everything you need: efficient storage, fast similarity search, and easy metadata filtering.
Or read this on Hacker News