Get the latest tech news

Decoding UTF8 with parallel extract


As a side-quest I recently decided to write a branchless utf8 decoder utilizing the pext or "parallel extract" instruction. It's compliant with rfc-3629, meaning that it doesn't just naively decode the code-point but also checks for overlong encoding, surrogate pairs and such.

The leading byte doesn't require validation since our length lookup table did so already, but I redundantly check for it again anyways since it's practically free to do so. I haven't looked too deeply into it since the entire premise was making a decoder utilizing a single (and fast) instruction. Depending on hardware specific instruction was already sketchy but the older zen issue puts the final nail in the coffin for me.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Parallel Extract

Parallel Extract

Photo of UTF8

UTF8