Hi,
In message "Re: Unicode in Ruby now?"
on 02/08/02, Jan Witt <ontologist_2000 / yahoo.com> writes:
|I beg your pardon,
|you have all the wonderful standards at your
|fingertips.
Probably you're asking Curt, but I will answer what I can.
|What is meant by "representing" a character?
|What are the attributes of a code point?
A code point is a number index to a character. "representing" a
character means encoding, for example:
Japanese Hiragana "Ka" has a code point 9252 in JIS
EUC encoded "Ka" is "\xa4\xab".
Japanese Hiragana "Ka" has a code point 12363 in Unicode
UTF-8 encoded "Ka" is "\xe3\x81\x8b".
|Outside of natural language text processing,
|are there areas where the parsing of
|non-Latin-1 strings is relevant? If so,
|what are they?
Because some people in the world need it to represent their daily
text. My mail, memo, journal, and almost everything are written in
non-Latin-1 string (EUC-JP).
matz.