Get the latest tech news

How to chop off bytes of an UTF-8 string to fit into a small slot and look nice


Chopping UTF-8 While researching a very weird bug0 in Koha I had to figure out a way chop a string to a specific maximum length. In bytes and not in characters, because in that case the horrible format USMARC is used, whose spec starts with two red flags: It's from January 2000, and it's an "implementation of the American national standard", so you can bet that it only works (well) with ASCII and will be ...

While researching a very weird bug in Koha I had to figure out a way chop a string to a specific maximum length. In bytes and not in characters, because in that case the horrible format USMARC is used, whose spec starts with two red flags: It's from January 2000, and it's an "implementation of the American national standard", so you can bet that it only works (well) with ASCII and will be ... interesting when handling Unicode. Older formats (like ASCII) used a fixed length (eg one byte = 8 bit), but could therefore only represent a limited amount of letters.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of bytes

bytes

Photo of UTF-8

UTF-8

Photo of UTF-8 string

UTF-8 string

Related news:

News photo

China finds poor data storage leads to waste, as AI, satellites and self-driving cars generate masses of bytes

News photo

You probably don't need to validate UTF-8 strings

News photo

Show HN: Attempt to bring a cinematic experience in 256 bytes (WASM)