On Wed, 26 Jun 2002, Jan Witt wrote: > As I see it, the Unicode effort has been deeply > misguided right from the beginning.... I think that you're deeply misguided right from the beginning about what Unicode is supposed to do. :-) > (1) in many languages there are more glyphs than > letters in the alphabet, e.g. because of ligatures, > i.e. letters that get intertwined with their > neighbors.( take Hindi or Arabic as examples) > Unicode does not cater for this. Nor is it supposed to. These are typesetting issues, not data issues. The word "fish" contains the same letters whether or not you use a ligature for the "fi". > (2) Diacritics are not everywhere as simple as > accents in French, umlauts in German , which > luckily could be fit into Latin-1. So? Unicode deals with a *lot* of diacritical marks. (Take a look at the Vietnamese support, for example.) Where exactly does Unicode fall down in supporting diacriticals? > (3) Some languages are written from left to right, > some from the top down and texts may be mixed. Unicode supports mixed-direction writing. > Please consider that a multilingual text editor > must know about the [possibly varying] glyph bindings > of all of its > languages. Not really. I get by just fine with an editor that cannot generate "proper" (in print terms) glyphs for "fi", "fl", "ffl", and so on. I suspect most others do, too. > (5) Japanese, as you probably know, has the rich > choice of kanji characters and the two kana alphabets, > but no ligatures. Since I know a little bit of Japanese, I'd be particularly interested in what you think the Unicode problems are in relation to Japanese. > (6) Collating sequences are a nontrivial issue. > In classical Spanish, e.g. LL and CH are > considered > separate characters. Unicode does not specify any collating sequences. cjs -- Curt Sampson <cjs / cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC