On May 8, 2007, at 1:26 AM, Nanyang Zhan wrote: > John Joyce wrote: > >> And yes, the overhead will be greater, but that's just a fact of >> unicode and large character sets like chinese and japanese. >> You will also want to check which chinese! >> Chinese is split into two (politically safe) names : Traditional and >> Simpllified. >> If you were doing Japanese text, separating English or other western >> languages wouldn't be so easy, since Japanese essentially includes a >> number of other languages' character sets in its unicode set and in >> everyday usage. > > You are right. And let alone the characters, there is a different > set of > punctuations! > > So, you don't think there is a doc about the number range string[0] > return with a specified language? > > I wonder what those number mean... > > > -- > Posted via http://www.ruby-forum.com/. > there is a doc. go to www.unicode.org There should be a pdf (many actually) I don't know if the two main chinese sets are encoded as different ranges or simply declared in some way. In general in Unicode a character is the same character even when it appears in a different language.