On Thu, 1 Aug 2002, Hal E. Fulton wrote:

> Seriously, since you have some expertise, I'm sure your knowledge will
> be valuable in improving Ruby... talk to vruz also.

I doubt it. My opinion of the matter is that the correct way to
do things is to go with Unicode internally. (This does not rule
out processing non-Unicode things, but you process them as binary
byte-strings, not as character strings.) You lose a little bit of
functionality this way, but overall it's easy, fast, and gives you
everything you really need.

Unfortunately, a lot of Japanese programmers disagree with this. They
feel the need, for example, to have separate code points for a single
character, simply because one stroke is slightly different between the
way Japanese and Chinese people write it. (The meaning is exactly the
same.)

They sometimes even feel the need to have the source language encoded
within strings, rather than having only applications that need this
information deal with it in their data formats. (It's not that there
aren't uses for these sorts of features, but they are not useful enough
to put the burden and overhead of them on every single program that
wants to deal with a bit of text.)

Basically, if I18N is not going to be completely impossible, you're
going to have to live with a bit of lossage when it comes to putting
data into a computer, especially kanji data. But everybody suffers this
loss: even in English we lived through all the days of ASCII without the
ability to spell co-operate properly (with a diaeresis over the second
'o', instead of the hyphen). Or naive (diaeresis over the 'i'), for that
matter. We lived.

Anyway, I've had it with that battle. Ruby gets what it gets, and maybe
one day I'll be able easily to use it for I18N work, maybe not. In the
mean time there's perl and Java.

cjs
-- 
Curt Sampson  <cjs / cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC