On 11/8/05, Austin Ziegler <halostatue / gmail.com> wrote: > On 11/7/05, Curt Sampson <cjs / cynic.net> wrote: > > On Tue, 8 Nov 2005, Daniel Wislocki wrote: > >> Actually what I've found is that most Japanese don't use Unicode at > >> all, but one of the other encodings like Shift-JIS. > > Actually, the more technical Japanese folks seem to actively hate > > Unicode, though I've never gotten a solid answer as to why that didn't > > involve confusing glyphs with their representations. (Which I guess I > > can see how it might upset you if your name came out with the wrong > > version of the character--but besides being damn rare, we have > > solutions for this.) > > I've done a bit of work with Unicode and there are two primary > objections to the Unicode standard and representations by Japanese > developers and users (I am generalising to CJK below): > > 1. The politics. The Eastern scripts were treated rather badly initially > and were painfully under-represented in Unicode 1.0 and probably > Unicode 2.0. I don't think that CJK were adequately represented until > Unicode 3.0, to be honest. > > 2. The political impact on representation. The existing CJK encodings > were/are relatively efficient for storing CJK, although some forms > used shift-in/shift-out byte markers, increasing the total number of > symbols that could be represented in a relatively efficent storage > format, even if processing time was a little slower. > > In contrast, UTF-8 and UTF-16 are relatively inefficient. IIRC, most > Japanese encodings will use between 1 and 2 bytes per glyph. UTF-8 > will use between 3 and 4 bytes per glyph. UTF-16 uses either 2 or 4 > bytes per glyph. > > -austin > -- > Austin Ziegler * halostatue / gmail.com > * Alternate: austin / halostatue.ca > > Good summarization. But I'd like to add some comments. In most case, inefficiency of storage and text-processing of unicode encodings is a minor issue compared to other crucial factors. Computer world is enoughly modernized. :) As you also mentioned, most CJK(especially Japanese) people dislike the great stupid process of Han Unification[1] by Unicode Consortium first. Another big obstacle is incompatibility with other legacy systems elsewhere. Indeed, there's no strong/compelling reasons to change their charset to common people. But it can be benefit to some developers getting 'I18N' more conveniently. -- http://nohmad.sub-port.net