On Sep 19, 2008, at 4:52 AM, Yukihiro Matsumoto wrote:

>  UTF-16 is a nasty beast,
> but as I stated we have other beasts (dummy encodings), so that simply
> removing UTF-16 would help us little.  We have to do it consistently,
> if we do.

I'm no expert in any of this, but I wonder if part of the problem  
might be that Ruby tries to support all encodings both internally and  
externally. Might it be easier to support the full set externally, but  
to have a more limited set internally? For example, you could support  
UTF-16<any endian> as an external encoding, but transcode to UTF-8 on  
the way in. You could still support a rich variety of internal  
encodings, including the Asian ones you need. But you wouldn't have to  
deal with UTF-16 when implementing Regexp#escape :)  So, keep the  
current set of encodings, but only allow a reasonable (ASCII- 
compliant) subset as internal encodings.



Dave