Get the latest tech news

RFC 9839 and Bad Unicode

s good. If you’re designing a data structure or protocol that has text fields, they should contain Unicode characters encoded in UTF-8.

Unpacking all the JSON escaping gibberish reveals that the value of the username field contains four numeric “code points” identifying Unicode characters: Unicode has a category called “noncharacter”, containing a few dozen code points that, for a variety of reasons, some good, don’t represent anything and must not be interchanged on the wire. It doesn’t have a version number or release just yet, I’ll wait till a few folk have had a chance to spot any dumb mistakes I probably made.

Get the Android app

Or read this on Hacker News