Get the latest tech news
The Beautiful Math of Bloom Filters
y Noam Yadgar Probabilistic functions can model many algorithms and procedures. They can help us optimize procedures to produce the best results.
Size/4 million itemsHash functionsFalse positive3.125Mb44.99%3.75Mb52.7%4.79Mb61%6.25Mb80.25% Optimal filter size Now that we know how to express \(k\) in terms of \(m\) and \(n\), we can reverse our original question of false positive probability and write a fully generalized equation of which we can find the optimal ratio between the filter size to the number of items that yields the lowest false positive probability. Using elegant math, we designed a process that predicts items’ membership in an extensive dataset with an outstanding 1% false positive rate. With just a few megabytes and a small set of non-cryptographic hash functions, it’s easy to see the usefulness of a Bloom filter, especially in BigData systems that provide free text-based searches in enormous datasets.
Or read this on Hacker News