A Princeton professor, finding a little time for himself in the summer academic lull, emailed an old friend a couple months ago. Brian Kernighan said hello, asked how their friend's US visit was going, and dropped off hundreds of lines of code that could add Unicode support for AWK, the text-parsing tool he helped create for Unix at Bell Labs in 1977.
"I have tested this a fair amount but clearly more tests are needed," Kernighan wrote in the email, posted in late May as a kind of pseudo-commit on the onetrueawk repo by longtime maintainer Arnold Robbins. "Once I figure out how ... I will try to submit a pull request. I wish I understood git better, but in spite of your help, I still don't have a proper understanding, so this may take a while."
Kernighan is the "K" in AWK, a special-purpose language for extracting and manipulating language that was key to Unix's pipeline features and interoperability between systems. A working awk
function (AWK is the language, awk
the command to invoke it) is critical to both Standard UNIX Specification and IEEE POSIX certification for interoperability. There are countless variants of awk
—including modern derivations with support for Unicode—but "One True AWK," sometimes known as nawk
, is a kind of canonical version based on Kernighan's 1985 book The AWK Programming Language and his subsequent input.