There is documentation:

ri String#[]

Although it is a little vague about what "character code" means.  By
default (in ruby 1.8.x) the number returned by some_string[i] is a
fixnum in the range [0,255] -- even for UTF-8 encoded strings.  Ruby
will just treat the string as a string of 8-bit bytes and give you
back whatever byte you asked for.

irb(main):001:0> s = "Ҽ"
=> "\345\244\247\346\231\272\350\213\245\346\204\232"
irb(main):002:0> s[0]
=> 229
irb(main):003:0> s.length
=> 12

On May 8, 12:26 am, Nanyang Zhan <s... / hotmail.com> wrote:
> John Joyce wrote:
> > And yes, the overhead will be greater, but that's just a fact of
> > unicode and large character sets like chinese and japanese.
> > You will also want to check which chinese!
> > Chinese is split into two (politically safe) names :  Traditional and
> > Simpllified.
> > If you were doing Japanese text, separating English or other western
> > languages wouldn't be so easy, since Japanese essentially includes a
> > number of other languages' character sets in its unicode set and in
> > everyday usage.
>
> You are right. And let alone the characters, there is a different set of
> punctuations!
>
> So, you don't think there is a doc about the number range string[0]
> return with a specified language?
>
> I wonder what those number mean...
>
> --
> Posted viahttp://www.ruby-forum.com/.