Get the latest tech news

RFC 9839 and Bad Unicode


s good. If you’re designing a data structure or protocol that has text fields, they should contain Unicode characters encoded in UTF-8.

Unpacking all the JSON escaping gibberish reveals that the value of the username field contains four numeric “code points” identifying Unicode characters: Unicode has a category called “noncharacter”, containing a few dozen code points that, for a variety of reasons, some good, don’t represent anything and must not be interchanged on the wire. It doesn’t have a version number or release just yet, I’ll wait till a few folk have had a chance to spot any dumb mistakes I probably made.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of RFC

RFC

Photo of Bad Unicode

Bad Unicode

Related news:

News photo

Most IT companies fail to serve security.txt for RFC 9116 in 2025

News photo

RFC 9557: Date and Time on the Internet: Timestamps with Additional Information

News photo

The "simple" 38 step journey to getting an RFC