Hi,

You might at first glance think that this post should go to ruby-dev, but  
please read to the end!

I have been pulling my hair out trying to convert a relatively simple app  
to support m17n under Ruby 1.9 to see what is involved. I need to support  
all common locales worldwide, and data can also be stored in UTF-8 or  
UTF-16. I was hoping that Ruby 1.9 was going to take the hard work out of  
this for me. It has to a certain extent, but UTF-16 is the problem - it  
breaks so many things, due to its "ASCII incompatibility" (using Ruby's  
definition). I can't even do simple things like pull out fields and  
substitute into another string without testing "encoding compatibility".  
Something as simple as:

	puts "The value is #{val}"

fails if val is UTF-16 data.

At one stage I got so frustrated that I was even thinking about going back  
to Python :-(
So I have ended up transcoding any UTF-16 data to UTF-8, and now things  
are going much better.

Maybe I am doing something wrong - if so please suggest something I can do  
other than transcode the UTF-16.

But this has lead me to look back at the issues with UTF-16 I have hit,  
and to think about all the internal code in Ruby to handle "ASCII  
incompatible" encodings, and the overhead involved with supporting it.

And I think that other Ruby programmers may end up doing what I have done  
- avoid using UTF-16 internally because it is too hard.

So my radical suggestion is this:

Remove internal support for non-ASCII encodings completely, and when  
reading/writing UTF-16 (and UTF-32) files automatically transcode to/from  
UTF-8.

My reasons:

- String & Regexp operations should just "work" without the programmer  
worrying about encoding comaptibility (I think!)
- The programmer only has to think about character encodings at the  
"interfaces" (files, network interfaces) not throughout the program logic
- To my knowledge UTF-16 & UTF-32 are the only "non-ASCII compatible" as  
Ruby defines it
- To my knowledge no one actually uses UTF-16 or UTF-32 as a locale
- I would avoid having to use ugly modes to open a file like  
"r:UTF-16LE:UTF-8" (very minor)
- Ruby's internal code would be simpler & cleaner and therefore probably  
faster and easier to maintain

Maybe I have got this all wrong - I am relatively new to m17n!

Cheers
Mike