Get the latest tech news

OCRing Music from YouTube with Common Lisp

's a tune on YouTube I always really liked called "Supersquatting." It was written by Dubmood and Zabutom, two really masterful chiptune composers, but this track always stood out to me as sounding really full and "fat." For those who don't know, these kinds of chiptunes are usually written in "tracker" software which is a category of music composing program that was popular in the 90s and early 2000s-- they produce "module" files which are self-contained music tracks containing both the note data and samples, which could be played back quickly in a game, keygen, or just for listening. There's a culture surrounding music trackers that's inmeshed with the demoscene, video games, and software cracking, but I digress-- here's a nice in-depth explanation video about it if you're interested.

The point is, this track Supersquatting in particular sounds really full, beyond normally what I considered to be possible with the (these days) rudimentary tools (FastTracker II), so I actually assumed when I first heard it that it must have been made in a more traditional DAW with VSTs and stuff. Looking back, maybe I should have cranked the temperature down, but regardless, this solution is a bit overkill anyway, since it's doing a separate HTTP request to a massive GPU-based model for every little chunk of text, costs a (relative) fortune, and took forever. I wired it all up by having FFmpeg dump out a series of BMP images to a pipe (so I could quickly parse the buffer size and read it into lisp-magick-wand) and set up a parallelized loop to call `classify` and store the parsed-out data.

Get the Android app

Or read this on Hacker News