Get the latest tech news

Decoding UTF-8. Part III: Determining Sequence Length – A Lookup Table


Avoiding branching with the help of a lookup table

In part two, we saw how to determine sequence length and mentioned that there are ways to reduce branching. Avoiding branching doesn’t come for free: a 256-byte array is added to the.rodata section, which may negatively affect caching. In the next post, we’ll try to figure out a way to reduce branching without using a lookup table.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of III

III

Photo of UTF-8

UTF-8

Photo of sequence length

sequence length

Related news:

News photo

Unicode shenanigans: Martine écrit en UTF-8

News photo

Marc Andreessen, Sequoia again back Kearny Jackson, this time in $65M Fund III

News photo

How to chop off bytes of an UTF-8 string to fit into a small slot and look nice