Okay, last I checked, strings were just treated as collections of bytes, and any multibyte character semantics were up to the programmer to implement. But I just noticed that in 1.8.3, utf8string.split(//) yeilds an array of strings, each containing a single UTF-8 character, irrespective of byte count. So are regexes in general Unicode-aware now? Any other UTF-8 tidbits in there I should know about? Thanks!