Hi,

In message "Re: Unicode roadmap?"
    on Sun, 18 Jun 2006 23:46:40 +0900, Juergen Strobel <strobel / secure.at> writes:

|Language implementation, and usage of the String class should be
|easier if this set is
|
|- well defined 
|- All characters are equally allowed in all Strings.

I understand these attributes might make implementation easier.   But
who cares if I don't care.  And I am not sure how these make usage
easier, really.

Somebody who owns gigabytes of text data in legacy encoding (e.g. me),
wants to avoid encoding conversion back and forth between Unicode and
legacy encoding everytime.  Another somebody want text processing on
historical text which character set is far bigger than Unicode.  The
"well-defined" simple implementation just prohibits those demands.  On
the contrary, M17N approach does not bother Universal Character Set
solution.  You just need to choose Unicode (UTF-8 or UTF-16) as
internal string representation, and convert encoding on I/O as you
might have done in Unicode centric languages.  Nothing lost.

You may worry about implementation difficulty (and performance), but
don't.  It's _my_ concern.  I made a prototype, and have convinced
that I can implement it with acceptable performance.

|Unicode code points are pretty good in this respect, better than the
|union of all characters in all encodings of possible M17N Strings.
|And we may use private extensions to Unicode for legacy characters not
|included in Unicode already.

"private extensions".  No.  It just cause another nightmare.

							matz.