Get the latest tech news

Toward a Sparse and Interpretable Audio Codec


Most widely-used modern audio codecs, such as Ogg Vorbis and MP3, as well as more recent "neural" codecs like Meta's Encodec or the Descript Audio Codec are based on block-coding; audio is divided into overlapping, fixed-size "frames" which are then compressed. While they often yield excellent reproductions and can be used for downstream tasks such as text-to-audio, they do not produce an intuitive, directly-interpretable representation. In this work, we introduce a proof-of-concept audio encoder that represents audio as a sparse set of events and their times-of-occurrence. Rudimentary physics-based assumptions are used to model attack and the physical resonance of both the instrument being played and the room in which a performance occurs, hopefully encouraging a sparse, parsimonious, and easy-to-interpret representation.

View a PDF of the paper titled Toward a Sparse and Interpretable Audio Codec, by John Vinyard While they often yield excellent reproductions and can be used for downstream tasks such as text-to-audio, they do not produce an intuitive, directly-interpretable representation. Rudimentary physics-based assumptions are used to model attack and the physical resonance of both the instrument being played and the room in which a performance occurs, hopefully encouraging a sparse, parsimonious, and easy-to-interpret representation.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of audio

audio

Photo of codec

codec

Related news:

News photo

Apple Releases New AirPods Max Firmware With Lossless and Low-Latency Audio Support [Updated]

News photo

HyperX's new Cloud III S wireless headset streams Bluetooth audio for 200 hours on a single charge

News photo

China will enforce clear flagging of all AI generated content starting from September | AI text, audio, video, images, and even virtual scenes will all need to be labeled.