Get the latest tech news

The WTF-8 Encoding


the extent possible under law, the editors have waived all copyright and related or neighboring rights to this work. WTF-8 (Wobbly Transformation Format − 8-bit) is a superset of UTF-8 that encodes surrogate code points if they are not in a pair.

It represents, in a way compatible with UTF-8, text from systems such as JavaScript and Windows that use UTF-16 internally but don’t enforce the well-formedness invariant that surrogates must be paired. For the purpose of this specification, generalized UTF-8 is an encoding of sequences of code points(not restricted to Unicode scalar values) using 8-bit bytes, based on the same underlying algorithm as UTF-8. Thanks for feedback and contributions from Anne van Kesteren, David Baron, Dylan Petonke, Guillaume Knispel, Henri Sivonen, Jacob Lifshay, James Graham, Lily Ballard, Mathias Bynens, Ms2ger, Sam Tobin-Hochstadt, Tab Atkins.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of WTF-8 Encoding

WTF-8 Encoding