Get the latest tech news

The Beautiful Math of Bloom Filters


y Noam Yadgar Probabilistic functions can model many algorithms and procedures. They can help us optimize procedures to produce the best results.

Size/4 million itemsHash functionsFalse positive3.125Mb44.99%3.75Mb52.7%4.79Mb61%6.25Mb80.25% Optimal filter size Now that we know how to express \(k\) in terms of \(m\) and \(n\), we can reverse our original question of false positive probability and write a fully generalized equation of which we can find the optimal ratio between the filter size to the number of items that yields the lowest false positive probability. Using elegant math, we designed a process that predicts items’ membership in an extensive dataset with an outstanding 1% false positive rate. With just a few megabytes and a small set of non-cryptographic hash functions, it’s easy to see the usefulness of a Bloom filter, especially in BigData systems that provide free text-based searches in enormous datasets.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Bloom Filters

Bloom Filters

Photo of beautiful math

beautiful math

Related news:

News photo

When Bloom filters don't bloom (2020)

News photo

Using Parquet's Bloom Filters