Tim Bray <tbray / textuality.com> writes: >> You need glyphs, and some glyphs can be >> produced with multiple code points (e.g., LOWERCASE A + COMBINING >> ACUTE >> ACCENT as opposed to A ACUTE). > > This is another thing you need your String class to be smart about. > You want an equality test between "máÔ" and "máÔ" to always be true > even their "ᢠcharacters are encoded differently. The right way to > solve this is called "Early Uniform Normalization" (see http:// > www.w3.org/TR/2003/WD-charmod-20030822/#sec-Normalization); the idea > is you normalize the composed characters at the time you create the > string, then the internal equality test can be done with strcmp() or > equivalent. Does that mean that binary.to_unicode.to_binary != binary is possible? That could turn out pretty bad, no? > -Tim -- Christian Neukirchen <chneukirchen / gmail.com> http://chneukirchen.org