Get the latest tech news
I sped up serde_json strings by 20%
I have recently done some performance work and realized that reading about my experience could be entertaining. Teaching to think is just as important as teaching to code, but this is seldom done; I think something I’ve done last month is a great opportunity to draw the curtain a bit. serde is the Rust framework for serialization and deserialization. Everyone uses it, and it’s the default among the ecosystem. serde_json is the official serde “mixin” for JSON, so when people need to parse stuff, that’s what they use instinctively. There are other libraries for JSON parsing, like simd-json, but serde_json is overwhelmingly used: it has 26916 dependents at the time of this post, compared to only 66 for simd-json. This makes serde_json a good target (not in a Jia Tan way) for optimization. Chances are, many of those 26916 users would profit from switching to simd-json, but as long as they aren’t doing that, smaller optimizations are better than nothing, and such improvements are reapt across the ecosystem.
Chances are, many of those 26916 users would profit from switching to simd-json, but as long as they aren’t doing that, smaller optimizations are better than nothing, and such improvements are reapt across the ecosystem. Also, due to a technicality, a similar regression applied to strings with consecutive escapes, e.g.\r\n or\uD801\uDC37, which is a really common sight in Unicode-ridden data. Unicode defines U+FFFF as a codepoint that does not signify a character, meaning that it’s extremely unlikely to be used in realistic data, so we can just branch on n == 0xFFFF afterwards and re-check if we should emit an error or the JSON genuinely contained a\uFFFF.
Or read this on Hacker News