Get the latest tech news
Wikidata as a Giant Crosswalk File
Let’s build a massive crosswalk connecting map data with just Wikidata, DuckDB, some Ruby, and a hard-won bash one-liner.
Today we’re going to build a cross-walk table for places(a topic near and dear to my heart) that you can do with just DuckDB, a short Ruby script, and one hard-earned bash line. However, for reasons unknown to me, they wrap these neatly separated rows with brackets ([ and]) and add a comma to each line so it’s a valid, JSON array containing 100+ million items. This hard-won line streams the uncompressed content into sed, which removes the trailing commas, then chunks the output into batches of 100,000 records which are finally gzipped into files.
Or read this on Hacker News