On Mon, Dec 15, 2008 at 11:51:15PM +0900, Yukihiro Matsumoto wrote: > |irb(main):029:0> "\x61\x62\x63".encoding > |=> #<Encoding:US-ASCII> > |irb(main):028:0> "\x61\xc3\x9f".encoding > |=> #<Encoding:ASCII-8BIT> > > This is old behavior. Now string literals are always in their source > encoding. Try newer version. That was 1.9.1-preview2. I have just built from trunk, and I get the same: irb(main):001:0> RUBY_REVISION => 20768 irb(main):002:0> "\x61\x62\x63".encoding => #<Encoding:US-ASCII> irb(main):003:0> "\x61\xc3\x9f".encoding => #<Encoding:ASCII-8BIT> irb(main):004:0> __ENCODING__ => #<Encoding:US-ASCII> And: $ cat ert.rb p __ENCODING__ p Encoding.default_external p Object.constants.grep(/RUBY/).map { |c| [c,Object.const_get(c)] } p "\x61\x62\x63".encoding p "\x61\xc3\x9f".encoding $ ruby19 ert.rb #<Encoding:US-ASCII> #<Encoding:UTF-8> [[:RUBY_VERSION, "1.9.1"], [:RUBY_RELEASE_DATE, "2008-12-16"], [:RUBY_PLATFORM, "i686-linux"], [:RUBY_PATCHLEVEL, 5000], [:RUBY_REVISION, 20768], [:RUBY_DESCRIPTION, "ruby 1.9.1 (2008-12-16 revision 20768) [i686-linux]"], [:RUBY_COPYRIGHT, "ruby - Copyright (C) 1993-2008 Yukihiro Matsumoto"], [:RUBY_ENGINE, "ruby"]] #<Encoding:US-ASCII> #<Encoding:ASCII-8BIT> Well, I guess I can make sense of this: - source encoding is US-ASCII, presumably by default - if I embed hex escapes, the literal is promoted to ASCII-8BIT - if I embed a UTF-8 codepoint directly in a literal, it raises an encoding error - external encoding is UTF-8, presumably from environment However I can add # Encoding: binary to the top of my source to get consistent encoding of literals. B.