At 02:35 08/09/11, Tim Bray wrote:
>On Sep 10, 2008, at 12:55 AM, Tanaka Akira wrote:

>> NFC (Normalization Form C) can be a solution for "ƥ".  But
>> there are characters which don't have single codepoint (as
>> some characters defined in JIS X 0213, for example).
>
>Unfortunately NFC isn't a solution because it isn't widely respected,  
>so a developer has to deal with nonstandard normalizations. :(

I think what Akira meant here is that you should use some kind
of normalization (e.g. NFC) as a preprocessing step, which
would avoid the need to map various pre/de-composed forms
to the same entry in your actual indexing code.

I think that's true, but it won't deal with the fact that
in text indexing, you often also want to link the index to
a non-accented version, and so on, so you always one way or
another end up having to look at each character/codepoint closely
anyway.

Regards,    Martin.



#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst / it.aoyama.ac.jp