On 17-jun-2006, at 23:55, Michal Suchanek wrote:

>
> First for reasons of efficiency. If an application is going to perform
> lots of slicing and poking on strings it will want some encoding that
> is suiatble for that such as UTF-32.
I would much rather prefer UTF-8 in a language such as Ruby which is  
often used as glue between
other systems. UTF-8 is used for interchange and it's indisputable.  
If you go for UTF-16 or UTF-32, you are most likely
to convert every single character of text files you read (in text  
files present in the wild AFAIK UTF-16 and UTF-32 are a minority,  
thanks to the BOM and other setbacks).

> If an application runs on system
> with little memory it will want space-efficient encoding (ie UTF-8 or
> UTF-16 for Asian languages). And if an appliaction runs on system that
> uses some legacy codepage it can read, write, and process all strings
> in that codepage. And in JRuby it will be useful to convert strings to
> UTF-16 so that the native Java functions can be used for manipulation.
>
> n your model you can modify Ruby to use
> strings composed of TRON characters instead of Unicode characters. But
> how would Unicode Ruby and TRON Ruby exchange strings?

I think Alan Little summed it up very well. The problem with Unicode  
in Ruby is strive for perfection
(i.e. satisfy the users of every conceivable or needed encoding).  
It's very noble and I personally can't imagine it
(even with the "democratic coerce" approach Austin cited). The only  
thing I don't know if a system having this type of handling can be  
built at all and how it will interoperate.

Up until now all scripting languages I used somewhat (Perl, Python,  
Ruby) allowed all encodings in strings and doing Unicode in them hurts.

Bluntly put, I am selfish and I don't believe in the "saving grace"  
of the M17N (because I just can't wrap it around my head and I sure  
as hell know it's going to be VERY complex).
It's also something that bothers me the most about Ruby's "unicode  
discussions" (I've read all of them on this list dating back to 2002  
because I need it to work NOW) and they
always transcend into this kind of religious discussion in the spirit  
of "but your encoding is not good enough", "but my bad encoding isn't  
that one and I still need it to work" etc.

While for me the greatest thing about Unicode is that it's Just Good  
Enough. And it doesn't seem Unicode is indeed THAT useless for CJK  
languages either
(although I'm sure Paul can correct me - all the 4 languages I am in  
control of use only 2 scripting systems with some odd additions here  
and there).

And no, I didn't have a chance to see a TRON system in the wild. If  
someone would show me one within 200 km distance I would be glad to  
take a look.
--
Julian 'Julik' Tarkhanov
please send all personal mail to
me at julik.nl