Get the latest tech news

Searching a Codebase in English

Cosine similarity in code vs. text.

To retrieve, we start by generating a semantic vector embedding for the query, in this case, "Word storing frequently accessed key-value pairs". We then compare that against our database of vectors and find the one(s) that match the closest, i.e., have the lowest dot product and highest similarity. Surely you could split up a codebase into files or functions as the “chunks”, embed them, and do a similar semantic similarity-based search.

Get the Android app

Or read this on Hacker News