As far as I understand, that was the original plan.

The question is how exactly to distinguish internal
and external encodings. Should we e.g. allow "UTF-16BE"
in a mode when opening a file, but not as an argument to
String#encode? But then what if you want to convert to
UTF-16BE and then use some compression (gzip,...) on
output?

And I think once we were at that point, what happened
was that to whatever extent it was easy to support an
encoding, it was done. As an example, Oniguruma supported
UTF-16(BE/LE) and so on, so that's usable now.

The alternative, which is suggested by this discussion,
is that we decide on a (pretty high) minimum standard for
support for an encoding. All encodings that don't reach
that standard are simply declared dummy and behave as
such (i.e. the same as binary, or even with less functionality).

This would force at least those who understand the issues
to use conversion. But there would still be those that
might do operations on a string labeled "UTF-16BE" under
the impression that this actually works.

It would also mean that each application has to do some
work to distinguish 'really supported' and 'dummy label'
encodings. Or that (as you suggest) conversion would be
automatic, which should work for Unicode-based encodings,
but which might bring up very subtle issues e.g. when
converting from iso-2022-jp to euc-jp (or do you choose
shift_jis?).

Regards,    Martin.

At 22:40 08/09/19, Dave Thomas wrote:
>
>On Sep 19, 2008, at 4:52 AM, Yukihiro Matsumoto wrote:
>
>>  UTF-16 is a nasty beast,
>> but as I stated we have other beasts (dummy encodings), so that simply
>> removing UTF-16 would help us little.  We have to do it consistently,
>> if we do.
>
>I'm no expert in any of this, but I wonder if part of the problem  
>might be that Ruby tries to support all encodings both internally and  
>externally. Might it be easier to support the full set externally, but  
>to have a more limited set internally? For example, you could support  
>UTF-16<any endian> as an external encoding, but transcode to UTF-8 on  
>the way in. You could still support a rich variety of internal  
>encodings, including the Asian ones you need. But you wouldn't have to  
>deal with UTF-16 when implementing Regexp#escape :)  So, keep the  
>current set of encodings, but only allow a reasonable (ASCII- compliant) subset as internal encodings.
>
>
>
>Dave
>


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst / it.aoyama.ac.jp