Hi,

Paul Brannan wrote:
> NARUSE, Yui wrote:
>> This is spec.  STDIN encoding will be locale when no magic comment and 
>> -K and -E.
> 
> Thank you for the table.  This makes a lot of sense.  What doesn't make 
> sense to me is that I can read invalid strings:
> 
> irb(main):017:0> File.open('/tmp/foo', 'w') { |f| f.puts "\x81" }
> => nil
> irb(main):018:0> s = File.open('/tmp/foo') { |f| f.gets }
> => "\x81\n"
> irb(main):019:0> s.encoding
> => #<Encoding:US-ASCII>
> irb(main):020:0> "\x81" =~ /foo/
> => nil
> irb(main):021:0> s =~ /foo/
> ArgumentError: broken US-ASCII string
>         from (irb):21
>         from /usr/local/lib/ruby/1.9.0/irb.rb:149:in `block (2 levels) 
> in eval_input'
>         from /usr/local/lib/ruby/1.9.0/irb.rb:262:in `signal_status'
>         from /usr/local/lib/ruby/1.9.0/irb.rb:146:in `block in eval_input'
>         from /usr/local/lib/ruby/1.9.0/irb.rb:145:in `eval_input'
>         from /usr/local/lib/ruby/1.9.0/irb.rb:69:in `block in start'
>         from /usr/local/lib/ruby/1.9.0/irb.rb:68:in `catch'
>         from /usr/local/lib/ruby/1.9.0/irb.rb:68:in `start'
>         from /usr/local/bin/irb-1.9:12:in `<main>'
> 
> Is this behavior also intended?  Can/should I change the locale/encoding 
> of my input streams?

Default IO encoding follows Encoding.default_external. So this is 
following case.

Normal script case            | script encoding | default external |
------------------------------+-----------------+------------------+
no -K -E, no   magic comment  | US-ASCII        | ->locale<-       |

Your locale seems LANG=C, so input string is US-ASCII.
 > irb(main):019:0> s.encoding
 > => #<Encoding:US-ASCII>

 > irb(main):020:0> "\x81" =~ /foo/
 > => nil
And why "\x81" =~ /foo/ is not error, the encoding of literal "\x81" is 
ASCII-8BIT.
irb(main):008:0> "\x81".encoding
=> #<Encoding:ASCII-8BIT>
(Ruby got byte array \x22\x5C\x78\x38\31\x22 and create "\x81" 
ASCII-8BIT string)


If you want to set another encoding for example UTF-8, you can add 
encoding explicitly.

 > irb(main):018:0> s = File.open('/tmp/foo', 'r:utf-8') { |f| f.gets }

% env LC_ALL=C irb19
irb(main):001:0> s = File.open('/tmp/foo') { |f| f.gets }
=> "\x81\n"
irb(main):002:0> s.encoding
=> #<Encoding:US-ASCII>
irb(main):003:0> s = File.open('/tmp/foo','r:utf-8') { |f| f.gets }
=> "\x81\n"
irb(main):004:0> s.encoding
=> #<Encoding:UTF-8>


-- 
NARUSE, Yui  <naruse / airemix.com>
DBDB A476 FDBD 9450 02CD 0EFC BCE3 C388 472E C1EA