--1sNVjLsmu1MXqwQ/ Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jun 19, 2006 at 01:33:54AM +0900, Yukihiro Matsumoto wrote: > Hi, > > In message "Re: Unicode roadmap?" > on Sun, 18 Jun 2006 23:46:40 +0900, Juergen Strobel <strobel / secure.at> writes: > > |Language implementation, and usage of the String class should be > |easier if this set is > | > |- well defined > |- All characters are equally allowed in all Strings. > > I understand these attributes might make implementation easier. But > who cares if I don't care. And I am not sure how these make usage > easier, really. > > Somebody who owns gigabytes of text data in legacy encoding (e.g. me), > wants to avoid encoding conversion back and forth between Unicode and > legacy encoding everytime. Another somebody want text processing on > historical text which character set is far bigger than Unicode. The > "well-defined" simple implementation just prohibits those demands. On > the contrary, M17N approach does not bother Universal Character Set > solution. You just need to choose Unicode (UTF-8 or UTF-16) as > internal string representation, and convert encoding on I/O as you > might have done in Unicode centric languages. Nothing lost. > > You may worry about implementation difficulty (and performance), but > don't. It's _my_ concern. I made a prototype, and have convinced > that I can implement it with acceptable performance. I never worried about performance much, that's Austin. :P Thanks for clarifying that. So far I could not find much info on how exactly M17N will work, especially on the role of the encoding tag, so I had to guess a lot. Given your explanation, it seems our ways are quite similiar on the interface side of things, so far as Unicode is concerned. You chose a more powerful (and more complex) parametric class design for where I would have left open only the possiblity of transparently useable subclasses for performance reasons. I am happy we've worked that out now. And you are right, I am not that much interested in the implementation, thank you for doing it. My concern was with the interface of the String class, but several posters misunderstood me and tried to draw me into implementation issues. Jgen > |Unicode code points are pretty good in this respect, better than the > |union of all characters in all encodings of possible M17N Strings. > |And we may use private extensions to Unicode for legacy characters not > |included in Unicode already. > > "private extensions". No. It just cause another nightmare. > > matz. > > -- The box said it requires Windows 95 or better so I installed Linux --1sNVjLsmu1MXqwQ/ Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) iQEVAwUBRJXB1fy64gyiEfXtAQJ6EAf/a4XMnKPmhYfTYTr8/SCFL3X2YGTLFmBi l+50l01Qpijj6ZpAumwK3gB/sVQ+1jLJgw/F/JzYpbPbNg6gHCV1XzB/OKwCrNPX 2Gh9OX1eU9ENiKt3w+mO5/qfg7wSYAtx852BDEDhnu/9sxR8gKNFrf/pzsFlW3CP 2LLg73feaFBLw/tKS0du3u2hsHIPTUVp/lwGYP0mJayEXvacYzceaP9p+e9XTjTW rtMzDQJh8tHCD/XeC4MmKER7s6oCZnhRfcRVoldddPVIu+aZ+EdD7cwnXVTlPtuL 4dllIbiIg6IDF5mdoOsEsnYLj+pbwB38g2Y+uO2AMCL4TAVfLAK+NQ Yn -----END PGP SIGNATURE----- --1sNVjLsmu1MXqwQ/--