Daniel DeLorme said...
> MonkeeSage wrote:
> > Ruby 1.8 doesn't have unicode support (1.9 is starting to get it).
> 
> I enrages me to see this kind of FUD. Through regular expressions, ruby 
> 1.8 has 80-90% complete utf8 support. And oniguruma makes utf8 support 
> well-near 100% complete.
> 
> 
> > Everything in ruby is a bytestring.
> 
> YES! And that's exactyly how it should be. Who is it that spread the 
> flawed idea that strings are fundamentally made of characters?

Are you being ironic?

> I'd like 
> to slap him around a little. Fundamentally, ever since the word "string" 
> was applied to computing, strings were made of 8-BIT CHARS, not n-bit 
> characters. If only the creators of C has called that datatype "byte" 
> instead of "char" it would have saved us so many misunderstandings.

And look at the trouble we're having ditching the waterfall method, all 
because someone misread a paper in the 1700s or thereabouts.

You might want to spar with Tim Bray from Sun who presented at RubyConf 
2006, where his slides state:

"99.99999% of the time, programmers want to deal with characters not  
bytes. I know of one exception: running a state machine on UTF8-encoded 
text. This is done by the Expat XML parser."

"In 2006, programmers around the world expect that, in modern languages, 
strings are Unicode and string APIs provide Unicode semantics correctly 
& efficiently, by default. Otherwise, they perceive this as an offense 
against their language and their culture. Humanities/computing academics 
often need to work outside Unicode. Few others do."

He reviews his chat here:

  http://www.tbray.org/ongoing/When/200x/2006/10/22/Unicode-and-Ruby

and the slides are here:

  http://www.tbray.org/talks/rubyconf2006.pdf

-- 
Cheers,
Marc