Get the latest tech news
Understanding the BM25 full text search algorithm
BM25 is a widely used algorithm for full text search. I wanted to understand how it works, so here is my attempt at understanding by re-explaining.
At the most basic level, the goal of a full text search algorithm is to take a query and find the most relevant documents from a set of possibilities. (In contrast, vector similarity search might use an embedding model trained on an external corpus of text to represent the meaning or semantics of the query and document.) The intuition here is that, at some point, the document is probably related to the query term and we don't want an infinite amount of repetition to be weighted too heavily in the score.
Or read this on Hacker News