David Flanagan wrote: > Sam Ruby wrote: >> I've tried porting a few small codebases, and a few experiments, and >> documented some of my findings here: >> >> http://intertwingly.net/blog/2007/12/28/3-1-2 >> >> - Sam Ruby > > Here's the response I left on Sam's blog: I responded, also on my weblog. An explanation (or pointer to the documentation) of the apparently inconsistent results from my table: http://intertwingly.net/stories/2007/12/28/hearts.rb http://intertwingly.net/stories/2007/12/28/hearts.html As well an explanation of the differences between the following two would be appreciated: http://intertwingly.net/stories/2007/12/28/test1.rb http://intertwingly.net/stories/2007/12/28/test2.rb > Sam, > > It sounds like your complaint is with Array.pack and the rexml library, > not with all of Unicode in Ruby 1.9. > > Given that the point of Array.pack is to serialize data into byte > strings, I think its behavior is probably correct as it is. Admitedly > confusing, though. A documentation clarification is probably in order. > (Though pack() has always been a confusing method!) > > Instead of using pack to convert Unicode codepoints to strings, try the > Integer#chr method, with the desired encoding as an argument. (Your > comment system wonÃÕ allow me to enter an example: it must think that > IÃÎ embedding JS or something). > > I donÃÕ know anything about the rexml library. But the 1.9.0 is not > really expected to be stable yet, and I suspect that there are a number > of libraries that havenÃÕ been carefully ported yet. > > Like so much of Ruby, I think youÃ×e got to give the Unicode support a > chance to grow on you. I donÃÕ understand why Matz made some of the > choices he did, but they seem to work okay. Keep in mind, too, that the > goal was not just to support Unicode but also to support Japanese > encodings as well. So some of the design decisions might make a lot > more sense to programmers who have to work with SJIS and EUC every day. > > Finally, Ruby does inherit the default external encoding from the locale > if you donÃÕ specify an encoding with -K, -E or --encoding. This is the > encoding assumed when you read from a file and do not specify a > different encoding. (It is not used when you write to a file or read or > write from a socket or pipe, however.) It respects the standard > LC_CTYPE, LC_ALL, and LANG variables. Encoding.default_external returns > the value. Encoding.locale_encoding didnÃÕ make it into 1.9.0, but it > is in the current sources and returns the default encoding for the > locale even if -K, -E, or --encoding is specified. > > (I attempt to explain all this in The Ruby Programming Language which > should be in bookstores in about a month. IÃÎ making the last-minute > changes today.) > > David Flanagan >