At 16:55 08/09/10, Tanaka Akira wrote:
>In article <3119E5AB-AEC8-4FEE-B2FA-8C75482E0E9D / sun.com>,
>  Tim Bray <Tim.Bray / Sun.COM> writes:
>
>> Yes, there are lots of others.  For example, a full-text indexing  
>> system dealing with a word like Qu˝├ec, which needs to index it the  
>> same whether the appears as one codepoint or two.
>
>" is a character, even if it is represented as two
>codepoints.
>
>So ruby should treat it as a character.
>
>I know current ruby doesn't do that.  But it is desirable.

It is desirable in some situations, but not in others.


>NFC (Normalization Form C) can be a solution for ".  But
>there are characters which don't have single codepoint (as
>some characters defined in JIS X 0213, for example).
>
>I think codepoint is implementation details.  Although it
>may be useful for unicode experts, non-experts will be
>confused with the difference of characters and codepoints.

This is not a question of unicode experts vs. non-experts,
but a question of low level vs. high level. Somebody e.g.
writing a punycode implementation doesn't have to be an
Unicode expert at all, s/he just needs access to the
codepoints.

>I think it should not be provided by default.

Are you saying that there should be a separately compiled
version of Ruby that supports #each_code[point]?
My guess would be that a sentence e.g. saying
"low-level access" would be enough to warn people,
if it's really a concern at all.

Regards,   Martin.



#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst / it.aoyama.ac.jp