--J2SCkAp4GZ/dPZZf
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jun 14, 2006 at 05:26:58PM +0900, Victor Shepelev wrote:
> From: Dmitry Severin [mailto:dmitry.severin / gmail.com]
> Sent: Wednesday, June 14, 2006 11:20 AM
> > To: ruby-talk ML
> > Subject: Re: Unicode roadmap?
> > 
> > Almost all typical tasks on Unicode can be handled with UTF8 support in
> > Regexp,  Iconv, jcode and $KCODE=u, and unicode[1] library (as in
> > unicode_hack[2]) :)
> > (but case-insensitive regexp don't work for non ASCII chars in Ruby 1.8,
> > that can be probably solved using latest Oniguruma).
> > 
> > But if you're looking for deeper level of "Unicode support", e.g. as
> > described in Unicode FAQ[3], those problems  aren't about handling Unicode
> > per se, but are rather L10N/I18N problems, such as locale dependent text
> > breaks,collation, formatting etc.
> > To deal with them from Ruby  take look at somewhat broken wrappers to ICU
> > library icu4r[4], g11n[5] and Ruby/CLDR[6].
> 
> Thanks Dmitry!
>  
> > And if you want Unicode as default String encoding and want to use
> > national
> > chars in names for your vars/functions/classes in Ruby code, I believe,t
> > will never happen. :)
> 
> Hmmm.. I've think Unicode IS defaul String encoding when $KCODE=u
> Not?
> 
> V.

Strictly speaking, Unicode is not an encoding, but UTF-8 is. 

For my personal vision of "proper" Unicode support, I'd like to have
UTF-8 the standard internal string format, and Unicode Points the
standard character code, and *all* String functions to just work
intuitively "right" on a character base rather than byte base. Thus
the internal String encoding is a technical matter only, as long as it
is capable of supporting all Unicode characters, and these internal
details are not exposed via public methods.

I/O and String functions should be able to convert to and from
different external encodings, via plugin modules. Note I don't require
non Unicode String classes, just the possibility to do I/O with
foreign characters sets, or conversion to byte arrays. Strings should
consist of characters, not just be a sequence of bytes meaningless
without external information about their encoding.

No ruby apps or libraries should break because they are surprised by
(Unicode) Strings, or it should be obvious the fault is with them.

Optionally, additional String classes with different internal Unicode
encodings might be a boon for certain performance sensitive
applications, and they should all work together much like Numbers of
different kinds do.

While I want ruby source files to be UTF-8 encoded, in no way do I
want identifiers to consist of additional national characters. I like
names in APIs everyone can actually type, but literal Strings is a
different matter.

I know this is a bit vague on the one hand, and might demand intrusive
changes on the other one.  Java history shows proper Unicode support
is no trivial matter, and I don't feel qualified to give advice how to
implement this. It's just my vision of how Strings ideally would be.

And of course for my personal vision to become perfect, everyone
outside Ruby should adopt Unicode too.

Jgen

-- 
 The box said it requires Windows 95 or better so I installed Linux

--J2SCkAp4GZ/dPZZf
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)

iQEVAwUBRI/ebvy64gyiEfXtAQLvOwf/Xcv52WNpQ2Fvv3NYzHrXC9mVTLsBnp8I
lWTnM9G3P3kUqXoIAmv6X75z+SM70TTu94AAqVoJ9PtTJrbBv3dBvo7tF1uVIKH9
q2oCUXW3xNQ84uqmzWpwWc6Wq2j5Qk0cnc/DTOTmQCcmLFkfZpaJ79CVqCNw35tL
Vx10/+pjH15PyBdjUeUczuIKbHci9VcFAdsRieTH0eWyu63gIyzLRrTiBafNp05R
0MIYSlSjmFVHMBx4FXREho0AWd+PqF3TCHiNM2qACa7y1Mj+OnlK2IC1ECvIVGqW
fPicgmAa2dcF0ok6dnj0Egxi0voQvEEHRb115tRiWyqYeJ6SQkdRBA
yJ -----END PGP SIGNATURE----- --J2SCkAp4GZ/dPZZf--