Hi,

In message "Re: [ruby-core:18640] Character encodings - a radical suggestion"
    on Wed, 17 Sep 2008 10:20:13 +0900, "Michael Selig" <michael.selig / fs.com.au> writes:

|So my radical suggestion is this:
|
|Remove internal support for non-ASCII encodings completely, and when  
|reading/writing UTF-16 (and UTF-32) files automatically transcode to/from  
|UTF-8.

What happens with non Unicode text under your suggestion?

My conservative suggestion is that:

Put "r:UTF-16BE:UTF-8" for mode when you open an UTF-16 file to read,
so that your internal strings are all UTF-8 encoding.

|My reasons:
|
|- String & Regexp operations should just "work" without the programmer  
|worrying about encoding comaptibility (I think!)
|- The programmer only has to think about character encodings at the  
|"interfaces" (files, network interfaces) not throughout the program logic

My "suggestion" satisfies above two.

|- To my knowledge UTF-16 & UTF-32 are the only "non-ASCII compatible" as  
|Ruby defines it

As akr stated this is wrong.

|- To my knowledge no one actually uses UTF-16 or UTF-32 as a locale

Yes.

|- I would avoid having to use ugly modes to open a file like  
|"r:UTF-16LE:UTF-8" (very minor)

This is ugly indeed.  We might add more Unicode support in the
future.  But we are no hurry.

|- Ruby's internal code would be simpler & cleaner and therefore probably  
|faster and easier to maintain

Dropping UTF-{16,32} is not enough.  Unless we abandon non-Unicode
encoding support altogether, it won't be THAT simple.  And I am not
going to remove their support.  I use them everyday.

							matz.