On 11/8/05, Austin Ziegler <halostatue / gmail.com> wrote:
> On 11/7/05, Curt Sampson <cjs / cynic.net> wrote:
> > On Tue, 8 Nov 2005, Daniel Wislocki wrote:
> >> Actually what I've found is that most Japanese don't use Unicode at
> >> all, but one of the other encodings like Shift-JIS.
> > Actually, the more technical Japanese folks seem to actively hate
> > Unicode, though I've never gotten a solid answer as to why that didn't
> > involve confusing glyphs with their representations. (Which I guess I
> > can see how it might upset you if your name came out with the wrong
> > version of the character--but besides being damn rare, we have
> > solutions for this.)
>
> I've done a bit of work with Unicode and there are two primary
> objections to the Unicode standard and representations by Japanese
> developers and users (I am generalising to CJK below):
>
> 1. The politics. The Eastern scripts were treated rather badly initially
>    and were painfully under-represented in Unicode 1.0 and probably
>    Unicode 2.0. I don't think that CJK were adequately represented until
>    Unicode 3.0, to be honest.
>
> 2. The political impact on representation. The existing CJK encodings
>    were/are relatively efficient for storing CJK, although some forms
>    used shift-in/shift-out byte markers, increasing the total number of
>    symbols that could be represented in a relatively efficent storage
>    format, even if processing time was a little slower.
>
>    In contrast, UTF-8 and UTF-16 are relatively inefficient. IIRC, most
>    Japanese encodings will use between 1 and 2 bytes per glyph. UTF-8
>    will use between 3 and 4 bytes per glyph. UTF-16 uses either 2 or 4
>    bytes per glyph.
>
> -austin
> --
> Austin Ziegler * halostatue / gmail.com
>                * Alternate: austin / halostatue.ca
>
>

Good summarization. But I'd like to add some comments.

In most case, inefficiency of storage and text-processing of unicode
encodings is a minor issue compared to other crucial factors. Computer
world is enoughly modernized. :)

As you also mentioned, most CJK(especially Japanese) people dislike the
great stupid process of Han Unification[1] by Unicode Consortium first.
Another big obstacle is incompatibility with other legacy systems
elsewhere. Indeed, there's no strong/compelling reasons to change their
charset to common people. But it can be benefit to some developers
getting 'I18N' more conveniently.

--
http://nohmad.sub-port.net