Get the latest tech news
Libpostal: C library for parsing/normalizing street addresses around the world
A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data. - openvenues/libpostal
Language classification: multinomial logistic regression trained (using the FTRL-Proximal method to induce sparsity) on all of OpenStreetMap ways, addr:* tags, toponyms and formatted addresses. Obviously '30 W 26th St Fl #7 != '30 West Twenty-sixth Street Floor Number 7' in a string comparison sense, but a human can grok that these two addresses refer to the same physical location. libpostal aims to create normalized geographic strings, parsed into components, such that we can more effectively reason about how well two addresses actually match and make automated server-side decisions about dupes.
Or read this on Hacker News