Hi, Paul Brannan wrote: > NARUSE, Yui wrote: >> This is spec. STDIN encoding will be locale when no magic comment and >> -K and -E. > > Thank you for the table. This makes a lot of sense. What doesn't make > sense to me is that I can read invalid strings: > > irb(main):017:0> File.open('/tmp/foo', 'w') { |f| f.puts "\x81" } > => nil > irb(main):018:0> s = File.open('/tmp/foo') { |f| f.gets } > => "\x81\n" > irb(main):019:0> s.encoding > => #<Encoding:US-ASCII> > irb(main):020:0> "\x81" =~ /foo/ > => nil > irb(main):021:0> s =~ /foo/ > ArgumentError: broken US-ASCII string > from (irb):21 > from /usr/local/lib/ruby/1.9.0/irb.rb:149:in `block (2 levels) > in eval_input' > from /usr/local/lib/ruby/1.9.0/irb.rb:262:in `signal_status' > from /usr/local/lib/ruby/1.9.0/irb.rb:146:in `block in eval_input' > from /usr/local/lib/ruby/1.9.0/irb.rb:145:in `eval_input' > from /usr/local/lib/ruby/1.9.0/irb.rb:69:in `block in start' > from /usr/local/lib/ruby/1.9.0/irb.rb:68:in `catch' > from /usr/local/lib/ruby/1.9.0/irb.rb:68:in `start' > from /usr/local/bin/irb-1.9:12:in `<main>' > > Is this behavior also intended? Can/should I change the locale/encoding > of my input streams? Default IO encoding follows Encoding.default_external. So this is following case. Normal script case | script encoding | default external | ------------------------------+-----------------+------------------+ no -K -E, no magic comment | US-ASCII | ->locale<- | Your locale seems LANG=C, so input string is US-ASCII. > irb(main):019:0> s.encoding > => #<Encoding:US-ASCII> > irb(main):020:0> "\x81" =~ /foo/ > => nil And why "\x81" =~ /foo/ is not error, the encoding of literal "\x81" is ASCII-8BIT. irb(main):008:0> "\x81".encoding => #<Encoding:ASCII-8BIT> (Ruby got byte array \x22\x5C\x78\x38\31\x22 and create "\x81" ASCII-8BIT string) If you want to set another encoding for example UTF-8, you can add encoding explicitly. > irb(main):018:0> s = File.open('/tmp/foo', 'r:utf-8') { |f| f.gets } % env LC_ALL=C irb19 irb(main):001:0> s = File.open('/tmp/foo') { |f| f.gets } => "\x81\n" irb(main):002:0> s.encoding => #<Encoding:US-ASCII> irb(main):003:0> s = File.open('/tmp/foo','r:utf-8') { |f| f.gets } => "\x81\n" irb(main):004:0> s.encoding => #<Encoding:UTF-8> -- NARUSE, Yui <naruse / airemix.com> DBDB A476 FDBD 9450 02CD 0EFC BCE3 C388 472E C1EA