--J2SCkAp4GZ/dPZZf Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jun 14, 2006 at 05:26:58PM +0900, Victor Shepelev wrote: > From: Dmitry Severin [mailto:dmitry.severin / gmail.com] > Sent: Wednesday, June 14, 2006 11:20 AM > > To: ruby-talk ML > > Subject: Re: Unicode roadmap? > > > > Almost all typical tasks on Unicode can be handled with UTF8 support in > > Regexp, Iconv, jcode and $KCODE=u, and unicode[1] library (as in > > unicode_hack[2]) :) > > (but case-insensitive regexp don't work for non ASCII chars in Ruby 1.8, > > that can be probably solved using latest Oniguruma). > > > > But if you're looking for deeper level of "Unicode support", e.g. as > > described in Unicode FAQ[3], those problems aren't about handling Unicode > > per se, but are rather L10N/I18N problems, such as locale dependent text > > breaks,collation, formatting etc. > > To deal with them from Ruby take look at somewhat broken wrappers to ICU > > library icu4r[4], g11n[5] and Ruby/CLDR[6]. > > Thanks Dmitry! > > > And if you want Unicode as default String encoding and want to use > > national > > chars in names for your vars/functions/classes in Ruby code, I believe,t > > will never happen. :) > > Hmmm.. I've think Unicode IS defaul String encoding when $KCODE=u > Not? > > V. Strictly speaking, Unicode is not an encoding, but UTF-8 is. For my personal vision of "proper" Unicode support, I'd like to have UTF-8 the standard internal string format, and Unicode Points the standard character code, and *all* String functions to just work intuitively "right" on a character base rather than byte base. Thus the internal String encoding is a technical matter only, as long as it is capable of supporting all Unicode characters, and these internal details are not exposed via public methods. I/O and String functions should be able to convert to and from different external encodings, via plugin modules. Note I don't require non Unicode String classes, just the possibility to do I/O with foreign characters sets, or conversion to byte arrays. Strings should consist of characters, not just be a sequence of bytes meaningless without external information about their encoding. No ruby apps or libraries should break because they are surprised by (Unicode) Strings, or it should be obvious the fault is with them. Optionally, additional String classes with different internal Unicode encodings might be a boon for certain performance sensitive applications, and they should all work together much like Numbers of different kinds do. While I want ruby source files to be UTF-8 encoded, in no way do I want identifiers to consist of additional national characters. I like names in APIs everyone can actually type, but literal Strings is a different matter. I know this is a bit vague on the one hand, and might demand intrusive changes on the other one. Java history shows proper Unicode support is no trivial matter, and I don't feel qualified to give advice how to implement this. It's just my vision of how Strings ideally would be. And of course for my personal vision to become perfect, everyone outside Ruby should adopt Unicode too. Jgen -- The box said it requires Windows 95 or better so I installed Linux --J2SCkAp4GZ/dPZZf Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) iQEVAwUBRI/ebvy64gyiEfXtAQLvOwf/Xcv52WNpQ2Fvv3NYzHrXC9mVTLsBnp8I lWTnM9G3P3kUqXoIAmv6X75z+SM70TTu94AAqVoJ9PtTJrbBv3dBvo7tF1uVIKH9 q2oCUXW3xNQ84uqmzWpwWc6Wq2j5Qk0cnc/DTOTmQCcmLFkfZpaJ79CVqCNw35tL Vx10/+pjH15PyBdjUeUczuIKbHci9VcFAdsRieTH0eWyu63gIyzLRrTiBafNp05R 0MIYSlSjmFVHMBx4FXREho0AWd+PqF3TCHiNM2qACa7y1Mj+OnlK2IC1ECvIVGqW fPicgmAa2dcF0ok6dnj0Egxi0voQvEEHRb115tRiWyqYeJ6SQkdRBA
yJ -----END PGP SIGNATURE----- --J2SCkAp4GZ/dPZZf--