Tim Bray <tbray / textuality.com> writes:

>> You need glyphs, and some glyphs can be
>> produced with multiple code points (e.g., LOWERCASE A + COMBINING
>> ACUTE
>> ACCENT as opposed to A ACUTE).
>
> This is another thing you need your String class to be smart about.
> You want an equality test between "m" and "m" to always be true
> even their " characters are encoded differently.  The right way to
> solve this is called "Early Uniform Normalization" (see http://
> www.w3.org/TR/2003/WD-charmod-20030822/#sec-Normalization); the idea
> is you normalize the composed characters at the time you create the
> string, then the internal equality test can be done with strcmp() or
> equivalent.

Does that mean that  binary.to_unicode.to_binary != binary  is possible?
That could turn out pretty bad, no?

>  -Tim
-- 
Christian Neukirchen  <chneukirchen / gmail.com>  http://chneukirchen.org