Get the latest tech news

You can't just assume UTF-8


How to infer character encodings with statistics

Often you can also tell which language the document is in as well this way - that's one way that web browsers know to pop up the "do you want to translate this page" dialogue box. There is a definite sense that auto-detecting text encoding is an instance of Postel's Law: "be conservative in what you do, be liberal in what you accept from others". Perhaps there is a case to be made that csvbase's auto-detection should be a more explicit user interface action than a pre-selected combo-box.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of UTF-8

UTF-8