2010/9/2 Roger Pack <rogerdpack2 / gmail.com>:
>> Japanese version of Windows uses CP932 (a.k.a. SJIS or Windows-31J).
>> And its command prompt uses CP932; it's not UTF-8 and can't use UTF-8.
>> So we must follow locale information.
>> (almost always locale reflects terminal's encoding)
>
> Good to know.
>
> I don't mean this as argumentative, but I still have some feedbacks.
> You can just gloss over them if you're done with the discussion :)
>
> I guess my only concern is that if you read in binary data, users of
> 1.9 *must* specify binary mode.
> (which is reasonable), but here is my confusion:
>
> $ cat test.rb
> p File.read('other_file_unknown_encoding').encoding
> p 'abc'.encoding
>
> $ ruby test.rb
> #<Encoding:IBM437>
> #<Encoding:US-ASCII>
>
> for better or worse, even though 'other_file_unknown_encoding' has
> only ASCII characters, its encoding is set to my system default.  > means that if I package up the file "other_file_unknown_encoding" in
> my gem, and read it later, I *must* specify its encoding when I read
> it.        
> adds extra confusion.

If the encoding of file is unknown, it needs to specify its encoding.
An application should get the encoding of external file from users
and use it when the app open a file.

Yes, you *must* specify its encoding.

> Suggestion in this regard: for file input do not use the system to
> determine encoding.       
> somewhere so they realize the implications of what is happening.

Yeah, if you don't want dynamic decision with locale
you can specify File.read("foo.txt", encoding:"UTF-8").

> The other surprising thing to me is that it assigns IBM437 not only to
> terminal input but also to file input. Nobody* edits files in
> IBM437.    > really, to have a default for file encoding.

try echo "foo bar" > foo.txt.

> Suggestion in this regard:       
> encoding for reading files, or (if I don't) it should default to some
> less desirable default, like BINARY.     
> what a file encoding was, if it isn't specified?

specify it.

> If this were a word
> processor, I would expect local files to be written in the default
> locale, but this is reading arbitrary files, so I think it should,
> again, force people to *at least once* specify their own default
> external encoding, so that they realize what is going on behind the
> scenes.

You can use -E or -U or Encoding.default_external=.

-- 
NARUSE, Yui
naruse / airemix.jp