Hi,

In message "Re: [ruby-core:19473] Re: Default source encoding (Was: [Bug #680] csv.rb: CSV.parse is toolate when encoding is mismatch)"
    on Fri, 24 Oct 2008 16:48:04 +0900, "Michael Selig" <michael.selig / fs.com.au> writes:

|The problem I am trying to solve is the compatibility of string literals  
|in your source and strings from other sources.
|
|"default_internal" was introduced to try to make all strings the same  
|encoding to avoid incompatibilities. But at the moment string literals  
|seem to default to the source encoding or to UTF-8 if oit is not set  
|(please correct me if I am wrong). What I was suggesting was a way to make  
|string literals be compatible.

You are correct here.

|This normally isn't a problem if:
|a) All string literals are 7 bit ASCII, or
|b) The source encoding matches "default_internal"
|
|If the source encoding of a program containing non-ascii string literals  
|is set different from default_internal, you are asking for trouble, and  
|would defeat the purpose of default_internal. Therefore to prevent the  
|programmer from having to remember to specify both, it makes sense to me  
|that the source encoding should default to default_internal. I think this  
|is important.

The point is that when we have a source code written in source
encoding, the literals naturally encoded in that encoding.  So do we
need to convert string literals in to default encoding?  But
conversion can bring us more troubles, since they tend to change the
meaning, for example what /[<a>-<b>]/ mean, where <a> and <b> are
multi byte characters and their corresponding codepoints (and sorting
order) differ in converted encoding?

|(By the way, I am not talking about libraries here. As I have stressed  
|previously, libraries should be carefully written to either use ASCII  
|string literals only, or to make sure that it transcodes them properly.)

That makes me feel much better, so we can limit the issue about the
scripts only.

|Finally, are you suggesting that "-e" should perform differently to a  
|single-line ruby script? That seems non-intuitive to me.

-e takes programs from command line shell, which probably yields
strings in locale encoding anyway.  But we cannot assume that for
scripts contained in files.

							matz.