Get the latest tech news
Overflow in consistent hashing (2018)
Ryan Marcus, assistant professor at the University of Pennsylvania (Fall '23). Using machine learning to build the next generation of data systems.
Consistent hashing was first proposed in 1997 by David Karger et al., and is used today in many large-scale data management systems, including (for example) Apache Cassandra. When our bin capacity is large, and our load factor is not too close to one, the tail of the distribution will collapse rapidly, so we can approximate this sum by taking the first few terms (and applying some algebra): Compare this to the blue curve, representing 200 bins – if I have the same node capacity (20) and the same load factor of 0.8 (implying that I have 3200 items), my overflow probability is practically one.
Or read this on Hacker News