2010/9/2 Roger Pack <rogerdpack2 / gmail.com>: >> Japanese version of Windows uses CP932 (a.k.a. SJIS or Windows-31J). >> And its command prompt uses CP932; it's not UTF-8 and can't use UTF-8. >> So we must follow locale information. >> (almost always locale reflects terminal's encoding) > > Good to know. > > I don't mean this as argumentative, but I still have some feedbacks. > You can just gloss over them if you're done with the discussion :) > > I guess my only concern is that if you read in binary data, users of > 1.9 *must* specify binary mode. > (which is reasonable), but here is my confusion: > > $ cat test.rb > p File.read('other_file_unknown_encoding').encoding > p 'abc'.encoding > > $ ruby test.rb > #<Encoding:IBM437> > #<Encoding:US-ASCII> > > for better or worse, even though 'other_file_unknown_encoding' has > only ASCII characters, its encoding is set to my system default. ¨Βθι> means that if I package up the file "other_file_unknown_encoding" in > my gem, and read it later, I *must* specify its encoding when I read > it. ¨Βιτ χιμμ συγγεεμογαμμαξͺζαιμͺ οξ αξοτθεβοψ¬ χθιγτο νε > adds extra confusion. If the encoding of file is unknown, it needs to specify its encoding. An application should get the encoding of external file from users and use it when the app open a file. Yes, you *must* specify its encoding. > Suggestion in this regard: for file input do not use the system to > determine encoding. ¨Βοςγε υσεςτο σπεγιζοξατ μεασοξγε ιξ γοδε > somewhere so they realize the implications of what is happening. Yeah, if you don't want dynamic decision with locale you can specify File.read("foo.txt", encoding:"UTF-8"). > The other surprising thing to me is that it assigns IBM437 not only to > terminal input but also to file input. Nobody* edits files in > IBM437. ¨Βζεεμιξαππςοπςιαττο υστθατ ζοτθεξγοδιξηος> really, to have a default for file encoding. try echo "foo bar" > foo.txt. > Suggestion in this regard: ¨Βσθουμδ ζοςγνε το ασσιηξ αξ εψπμιγιτ > encoding for reading files, or (if I don't) it should default to some > less desirable default, like BINARY. ¨Βοχ εμσε γαωοςεαμμω λξοχ > what a file encoding was, if it isn't specified? specify it. > If this were a word > processor, I would expect local files to be written in the default > locale, but this is reading arbitrary files, so I think it should, > again, force people to *at least once* specify their own default > external encoding, so that they realize what is going on behind the > scenes. You can use -E or -U or Encoding.default_external=. -- NARUSE, Yui naruse / airemix.jp