On May 8, 2007, at 1:26 AM, Nanyang Zhan wrote:

> John Joyce wrote:
>
>> And yes, the overhead will be greater, but that's just a fact of
>> unicode and large character sets like chinese and japanese.
>> You will also want to check which chinese!
>> Chinese is split into two (politically safe) names :  Traditional and
>> Simpllified.
>> If you were doing Japanese text, separating English or other western
>> languages wouldn't be so easy, since Japanese essentially includes a
>> number of other languages' character sets in its unicode set and in
>> everyday usage.
>
> You are right. And let alone the characters, there is a different  
> set of
> punctuations!
>
> So, you don't think there is a doc about the number range string[0]
> return with a specified language?
>
> I wonder what those number mean...
>
>
> -- 
> Posted via http://www.ruby-forum.com/.
>
there is a doc.
go to
www.unicode.org
There should be  a pdf (many actually)
I don't know if the two main chinese sets are encoded as different  
ranges or simply declared in some way.
In general in Unicode a character is the same character even when it  
appears in a different language.