Dave Thomas wrote:
> 
> On Jan 27, 2008, at 12:31 PM, Sam Ruby wrote:
> 
>> Before:
>>
>> $ irb
>> irb(main):001:0> RUBY_DESCRIPTION
>> => "ruby 1.9.0 (2007-12-25 revision 14709) [i686-linux]"
>> irb(main):002:0> "\u00a0"
>> => " "
>> irb(main):003:0> "\u00a0".encoding
>> => #<Encoding:UTF-8>
>> irb(main):004:0>
>>
>> Now:
>>
>> $ irb
>> irb(main):001:0> RUBY_DESCRIPTION
>> => "ruby 1.9.0 (2008-01-28 revision 0) [i686-linux]"
>> irb(main):002:0> "\u00a0"
>> => "\xC2\xA0"
>> irb(main):003:0> "\u00a0".encoding
>> => #<Encoding:ASCII-8BIT>
>> irb(main):004:0>
> 
> Sam:
> 
> Try it outside of irb
> 
> dave[RUBY3/Book 14:09:51] ruby -v -e 'p "\u00a0".encoding'
> ruby 1.9.0 (2008-01-28 revision 0) [i686-darwin9.1.0]
> #<Encoding:UTF-8>
> 
> I haven't figured out yet a decent way of setting the source encoding 
> for irb.

Dang.  I had hoped that the presence of a \u would unambiguously 
indicate that the string was encoded as utf-8.  The presence of such a 
shorthand for generating bytes which correspond to the unicode character 
if the actual encoding were, in fact, utf-8; but will likely generate 
something other than what you would expect if the coding in effect is 
anything but will likely generate much confusion.  Add to that the 
confusion that will be generated by having irb act differently...

If \u is not to have the behavior of forcing the encoding of the 
enclosing string to utf-8, I would suggest that having a syntax error be 
thrown would be much preferred.

- Sam Ruby