Get the latest tech news

Wikidata as a Giant Crosswalk File


Let’s build a massive crosswalk connecting map data with just Wikidata, DuckDB, some Ruby, and a hard-won bash one-liner.

Today we’re going to build a cross-walk table for places(a topic near and dear to my heart) that you can do with just DuckDB, a short Ruby script, and one hard-earned bash line. However, for reasons unknown to me, they wrap these neatly separated rows with brackets ([ and]) and add a comma to each line so it’s a valid, JSON array containing 100+ million items. This hard-won line streams the uncompressed content into sed, which removes the trailing commas, then chunks the output into batches of 100,000 records which are finally gzipped into files.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Wikidata

Wikidata

Photo of giant crosswalk file

giant crosswalk file

Related news:

News photo

Wikipedia and Wikidata as sources for analyzing Americanization