MonkeeSage wrote:
> Ruby 1.8 doesn't have unicode support (1.9 is starting to get it).

I enrages me to see this kind of FUD. Through regular expressions, ruby 
1.8 has 80-90% complete utf8 support. And oniguruma makes utf8 support 
well-near 100% complete.

 >> 'abvHgtwHFuG'.scan(/./u)
=> ["a", "", "b", "v", "H", "", "g", "t", "", "w", "H", "", "F", 
"u", "G"]

 >> 'abvHgtwHFuG'.scan(/[]/u)
=> ["", "", "", ""]

Ok, sometimes you have to take a weird approach because of the missing 
10-20%, but it's still workable
 >> 'abvHgtwHFuG'.scan(/(?:\303\251|\303\266|\303\245|\303\205)/u)
=> ["", "", "", ""]

> Everything in ruby is a bytestring.

YES! And that's exactyly how it should be. Who is it that spread the 
flawed idea that strings are fundamentally made of characters? I'd like 
to slap him around a little. Fundamentally, ever since the word "string" 
was applied to computing, strings were made of 8-BIT CHARS, not n-bit 
characters. If only the creators of C has called that datatype "byte" 
instead of "char" it would have saved us so many misunderstandings.

Usually the complaint about the support lack of unicode support is that 
something like "日本語".length returns 9 instead of 3, or that "日本語 
".index("語") returns 6 instead of 2. It's nice that people want to 
completely redefine the API to return character positions and all that, 
but please don't complain that it's broken just because you happen to be 
using it incorrectly. Use the right tool for the job. SQL for database 
queries, non-home-brewed crypto libraries for security, regular 
expressions for string manipulation.

I'm terribly sorry for the rant but I had to get it off my chest.

Dan