Get the latest tech news
Exploring pre-1990 versions of wc(1) (2023)
Latest update: Can you blame a tool that doesn't support a standard X when the tool was written before X was invented? While a reasonable person would certainly answer, "You can't" (and add, "You shouldn't"), sometimes it could be educational to examine how the absence of X has shaped the assumptions of the tool authors. Take one of the simplest Unix utils, wc(1).
Obviously, they didn't have UTF8 in 1979 (it was invented in 1992), but if a word is just a sequence of bytes that are not \t, \n, or space, even such a prehistoric version of wc should have analyzed the input correctly? wordct variable increments only if the current char falls within the range of space (040 in octal) and 177 (DEL, the last entry in ASCII). Although token variable is used in a clever way to keep track of whether the previous character was part of a word, it never changes its value from 0 because everything > 0177 v8 Research UNIX considers JUNK, not worthy of noticing.
Or read this on Hacker News