On Mon, Dec 15, 2008 at 11:51:15PM +0900, Yukihiro Matsumoto wrote:
> |irb(main):029:0> "\x61\x62\x63".encoding
> |=> #<Encoding:US-ASCII>
> |irb(main):028:0> "\x61\xc3\x9f".encoding
> |=> #<Encoding:ASCII-8BIT>
> 
> This is old behavior.  Now string literals are always in their source
> encoding.  Try newer version.

That was 1.9.1-preview2. I have just built from trunk, and I get the same:

irb(main):001:0> RUBY_REVISION
=> 20768
irb(main):002:0> "\x61\x62\x63".encoding
=> #<Encoding:US-ASCII>
irb(main):003:0> "\x61\xc3\x9f".encoding
=> #<Encoding:ASCII-8BIT>
irb(main):004:0> __ENCODING__
=> #<Encoding:US-ASCII>


And:

$ cat ert.rb
p __ENCODING__
p Encoding.default_external
p Object.constants.grep(/RUBY/).map { |c| [c,Object.const_get(c)] }
p "\x61\x62\x63".encoding
p "\x61\xc3\x9f".encoding

$ ruby19 ert.rb
#<Encoding:US-ASCII>
#<Encoding:UTF-8>
[[:RUBY_VERSION, "1.9.1"], [:RUBY_RELEASE_DATE, "2008-12-16"], [:RUBY_PLATFORM, "i686-linux"], [:RUBY_PATCHLEVEL, 5000], [:RUBY_REVISION, 20768], [:RUBY_DESCRIPTION, "ruby 1.9.1 (2008-12-16 revision 20768) [i686-linux]"], [:RUBY_COPYRIGHT, "ruby - Copyright (C) 1993-2008 Yukihiro Matsumoto"], [:RUBY_ENGINE, "ruby"]]
#<Encoding:US-ASCII>
#<Encoding:ASCII-8BIT>

Well, I guess I can make sense of this:
- source encoding is US-ASCII, presumably by default
- if I embed hex escapes, the literal is promoted to ASCII-8BIT
- if I embed a UTF-8 codepoint directly in a literal, it raises an
  encoding error
- external encoding is UTF-8, presumably from environment

However I can add

# Encoding: binary

to the top of my source to get consistent encoding of literals.

B.