There is documentation: ri String#[] Although it is a little vague about what "character code" means. By default (in ruby 1.8.x) the number returned by some_string[i] is a fixnum in the range [0,255] -- even for UTF-8 encoded strings. Ruby will just treat the string as a string of 8-bit bytes and give you back whatever byte you asked for. irb(main):001:0> s = "ÂçÃÒ¼ã¶ò" => "\345\244\247\346\231\272\350\213\245\346\204\232" irb(main):002:0> s[0] => 229 irb(main):003:0> s.length => 12 On May 8, 12:26 am, Nanyang Zhan <s... / hotmail.com> wrote: > John Joyce wrote: > > And yes, the overhead will be greater, but that's just a fact of > > unicode and large character sets like chinese and japanese. > > You will also want to check which chinese! > > Chinese is split into two (politically safe) names : Traditional and > > Simpllified. > > If you were doing Japanese text, separating English or other western > > languages wouldn't be so easy, since Japanese essentially includes a > > number of other languages' character sets in its unicode set and in > > everyday usage. > > You are right. And let alone the characters, there is a different set of > punctuations! > > So, you don't think there is a doc about the number range string[0] > return with a specified language? > > I wonder what those number mean... > > -- > Posted viahttp://www.ruby-forum.com/.