On Thu, 1 Aug 2002, Hal E. Fulton wrote: > Seriously, since you have some expertise, I'm sure your knowledge will > be valuable in improving Ruby... talk to vruz also. I doubt it. My opinion of the matter is that the correct way to do things is to go with Unicode internally. (This does not rule out processing non-Unicode things, but you process them as binary byte-strings, not as character strings.) You lose a little bit of functionality this way, but overall it's easy, fast, and gives you everything you really need. Unfortunately, a lot of Japanese programmers disagree with this. They feel the need, for example, to have separate code points for a single character, simply because one stroke is slightly different between the way Japanese and Chinese people write it. (The meaning is exactly the same.) They sometimes even feel the need to have the source language encoded within strings, rather than having only applications that need this information deal with it in their data formats. (It's not that there aren't uses for these sorts of features, but they are not useful enough to put the burden and overhead of them on every single program that wants to deal with a bit of text.) Basically, if I18N is not going to be completely impossible, you're going to have to live with a bit of lossage when it comes to putting data into a computer, especially kanji data. But everybody suffers this loss: even in English we lived through all the days of ASCII without the ability to spell co-operate properly (with a diaeresis over the second 'o', instead of the hyphen). Or naive (diaeresis over the 'i'), for that matter. We lived. Anyway, I've had it with that battle. Ruby gets what it gets, and maybe one day I'll be able easily to use it for I18N work, maybe not. In the mean time there's perl and Java. cjs -- Curt Sampson <cjs / cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC