Get the latest tech news
Bloom Filters by Example
A Bloom filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set. The price paid for this efficiency is that a Bloom filter is a probabilistic data structure: it tells us that the element either definitely is not in the set or may be in the set.
To see the difference that a faster-than-cryptographic hash function can make, check out this story of a ~800% speedup when switching a bloom filter implementation from md5 to murmur. Your false positive rate will be approximately(1-e-kn/m) k, so you can just plug the number n of elements you expect to insert, and try various values of k and m to configure your filter for your application. If you can't even ballpark estimate the number of elements to be inserted, you may be better off with a hash table or a scalable Bloom filter 4.
Or read this on Hacker News