Get the latest tech news
Index 1.6B Keys with Automata and Rust (2015)
I blog mostly about my own programming projects.
Along the way, we will talk about memory maps, automaton intersection with regular expressions, fuzzy searching with Levenshtein distance and streaming set operations. 50,000,000 keys is big, and compressing them in merely 27 seconds is a nice result, but I don’t see it as a representative example of real work loads, so it was unfortunate that the biggest data set I had to try with FSTs also happened to be a near best case scenario. My hope is that this article taught you a little something about using finite state machines as a data structure, which enable storing a large number of keys in a small amount of space while remaining easily searchable.
Or read this on Hacker News