On Mon, 27 Oct 2008 17:27:57 +1100, Nobuyoshi Nakada <nobu / ruby-lang.org>  
wrote:

> Even in 1.8 or prior, -Ks has been mandatory for Shift_JIS
> sources, so they have had -K in the shebang lines already.

Why then can I write a ruby 1.8 script which does a "puts" of a Shift_JIS  
string (no shebang or magic comment), and have it run fine without -Ks?

ruby1.8 t1.rb | od -c
0000000   S   h   i   f   t   _   J   I   S       s   t   r   i   n   g
0000020   :     202 240   , 202 242  \n
0000030

ruby1.8 -Ks t1.rb | od -c
0000000   S   h   i   f   t   _   J   I   S       s   t   r   i   n   g
0000020   :     202 240   , 202 242  \n
0000030

But on 1.9 it only works with -Ks:

ruby -v
ruby 1.9.0 (2008-10-27 revision 19961) [i686-linux]

ruby t1.rb
t1.rb:2: invalid multibyte char (US-ASCII)
t1.rb:2: invalid multibyte char (US-ASCII)

ruby -Ks t1.rb
0000000   S   h   i   f   t   _   J   I   S       s   t   r   i   n   g
0000020   :     202 240   , 202 242  \n
0000030

>
>> Defaulting source encoding to locale encoding (like -e does) should fix
>> this (as long as the end-user's locale is correct), right?
>
> Yes if they match.
>
>> I guess if necessary James can put "-KU" in the RUBYOPT environment
>> variable to save having to add multiple magic comments, but I feel this
>> shouldn't be necessary.
>
> -U option would be better.

I don't think that will work:

t2.rb is a single line script which does a puts of a short UTF-8 multibyte  
string.

ruby t2.rb
t2.rb:2: invalid multibyte char (US-ASCII)
t2.rb:2: invalid multibyte char (US-ASCII)

ruby -U t2.rb
ruby: "\xD8" on US-ASCII (Encoding::InvalidByteSequenceError)

ruby -KU t2.rb | od -c
0000000   U   n   i   c   o   d   e       s   t   r   i   n   g   :
0000020   a   b     330 265 330 271  \n
0000030

ruby1.8 t2.rb | od -c
0000000   U   n   i   c   o   d   e       s   t   r   i   n   g   :
0000020   a   b     330 265 330 271  \n
0000030


Cheers
Mike