Okay, last I checked, strings were just treated as collections of bytes, and
any multibyte character semantics were up to the programmer to implement.  But
I just noticed that in 1.8.3, utf8string.split(//) yeilds an array of
strings, each containing a single UTF-8 character, irrespective of byte
count.

So are regexes in general Unicode-aware now?  Any other UTF-8 tidbits 
in there I should know about?

Thanks!