Dave Thomas wrote: > > On Jan 27, 2008, at 12:31 PM, Sam Ruby wrote: > >> Before: >> >> $ irb >> irb(main):001:0> RUBY_DESCRIPTION >> => "ruby 1.9.0 (2007-12-25 revision 14709) [i686-linux]" >> irb(main):002:0> "\u00a0" >> => " " >> irb(main):003:0> "\u00a0".encoding >> => #<Encoding:UTF-8> >> irb(main):004:0> >> >> Now: >> >> $ irb >> irb(main):001:0> RUBY_DESCRIPTION >> => "ruby 1.9.0 (2008-01-28 revision 0) [i686-linux]" >> irb(main):002:0> "\u00a0" >> => "\xC2\xA0" >> irb(main):003:0> "\u00a0".encoding >> => #<Encoding:ASCII-8BIT> >> irb(main):004:0> > > Sam: > > Try it outside of irb > > dave[RUBY3/Book 14:09:51] ruby -v -e 'p "\u00a0".encoding' > ruby 1.9.0 (2008-01-28 revision 0) [i686-darwin9.1.0] > #<Encoding:UTF-8> > > I haven't figured out yet a decent way of setting the source encoding > for irb. Dang. I had hoped that the presence of a \u would unambiguously indicate that the string was encoded as utf-8. The presence of such a shorthand for generating bytes which correspond to the unicode character if the actual encoding were, in fact, utf-8; but will likely generate something other than what you would expect if the coding in effect is anything but will likely generate much confusion. Add to that the confusion that will be generated by having irb act differently... If \u is not to have the behavior of forcing the encoding of the enclosing string to utf-8, I would suggest that having a syntax error be thrown would be much preferred. - Sam Ruby