On Thu, Aug 01, 2002 at 11:23:38PM +0900, Curt Sampson wrote:
> > > I have just looked at my 3.0 standard and the 3.1 and 3.2 updates on the
> > > web site, and I do not see any evidence of this. Did I miss something?
> > > See the message I just posted for the details as I know them.
> >
> > http://www.unicode.org/unicode/reports/tr19/tr19-9.html :
> > [section 3, Relation to ISO/IEC 10646 and UCS-4]
> 
> Actually, I was looking for someone to attack my argument, not
> support it. :-)
:) The reason I've pointed to that section, is that there will be no
difference in ISO/IEC 10646 and Unicode very soon in sense of covered code
space. This means as soon as it will be acomplished, uniform expansion to
unused bits in 32-bit space will start. If CJK community will have
interest in it, of course. As you may remember, there were some complaints
in past about 'small' code space for covering CJK in Unicode. It wouldn't
be so relatively soon.

> > > Mojikyo wants to give maximum flexability in the display of Chinese
> > > characters. Given the number and complexity of kanji, these two aims are
> > > basically incompatable.
> >
> > I still don't see why both goals should be incompatible a priori. But this
> > is possible an offtopic here. :)
> 
> Partly efficiency concerns. As the speed of CPUs increases relative
> to memory, the relative cost of string handling (which is pretty
> memory intensive) gets higher and higher. And also things like ease
> of use; avoiding duplications makes things like pattern matching
> and use of dictionaries much easier. (Imagine, for example, that
> ASCII had two 'e's in it, and people used one or the other randomly,
> as they liked. Now instead of writing s/feet/fleet/, you have to
> write at least s/f[ee][ee]t/fleet/, or in certain fussy cases even
> s/f([ee][ee])t/fl\1t/. Ouch.
Well, it raises a completely different problem set. It does attack a
foundation upon which current meaning of character encoding is built.
Remember that 'character encoding' usually understood as a way to address
and differentiate 'characters' in a 'string' using one property --
position in some abstract 'alphabet' which has little to do with real
life language properties. For example, CP1251 which is used in Belarus and
other slavic countries has two 'i' -- one from ASCII and another (with
_exactly_ same glyph in fonts) for Belarussian and Ukrainian languages.
There is no information in the CP1251 encoding to differentiate those two
except attaching external property table (which is done in IANA proposal
by mapping all positions of encoding to some Unicode code points, which,
in turn, have all needed properties assigned).

What you are showing above, is a need to perform operations on these 'external'
properties, like it is done in ICU, for example. Actually, it would be much 
more productive to implement kind of Mojikyo inside ICU.


-- 
/ Alexander Bokovoy
---
Ever notice that even the busiest people are never too busy to tell you
just how busy they are?