At 00:01 08/09/18, Yukihiro Matsumoto wrote:
>Hi,
>
>In message "Re: [ruby-core:18663] Re: Character encodings - a radical suggestion"
>    on Wed, 17 Sep 2008 23:09:32 +0900, Matthias Whter 
><matthias / waechter.wiz.at> writes:
>
>|Is there a complete characterization of this whole problem? It seems
>|to be the main reason for sticking to non-UTF-8 character sets in
>|Ruby these days, and concluding from what I have read about it, a
>|solution could be the addition of missing characters/codepoints to
>|Unicode. Why does no-one consider going that way, but instead builds
>|a complicated stack of functions for conversions on top level?
>
>Just because it's impossible.  History sucks.  We have mixed up YEN
>SIGN and REVERSE SOLIDUS for long time.  They cannot be distinguished
>without context information.  Technically 0x5c should mean REVERSE
>SOLIDUS, but not always so for humans.

Thanks for putting it so bluntly. The Europeans did similar things
in the ISO 646 age (7-bit encodings with national variants), but were
fortunate enough to go through an intermediate stage of 8-bit encodings
before going multibyte.


>Besides that, Unicode is not a panacea.

Definitely not. But it makes a lot of things a lot easier for a
lot of people.


>Some character set
>(e.g. GB18030 for Chinese characters) is even bigger than Unicode.
>In fact, GB18030 is a super set of Unicode.

How exactly? I know that the Chinese government is requiring
GB 18030 support for software sold in China, and that the Unicode
Consortium and all the companies involved have been working hard to
make sure that this requirement is met by converting from and to
Unicode so that applications can use Unicode internally.


>|To some extent, it looks like 'some' people like insisting on the
>|status quo as it makes them feel special, swimming upstream the
>|Unicode waterfall, retaining on regional locales instead of solving
>|the issue. I do explicitly not refer to Ruby or the developers, they
>|just accept these special needs more than other computer language
>|designers with less sympathy for this anomaly.
>|
>|Nevertheless, a persisting fix is needed, and I think writing more
>|and more clutches for encoding conversion goes the wrong way. This
>|might still be needed for legacy file support, but day-to-day work
>|should not have to deal with this issue so prominently.
>
>You are free to feel so, but it's us who take up the burden.

I can't speak for Matz, but I think anybody who wants to share
some burden by providing patches and such is also very welcome,
although some of the issues discussed on this list are not yet
at the level where somebody could write a patch.


Regards,    Martin.



#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst / it.aoyama.ac.jp