On 6/19/06, Yukihiro Matsumoto <matz / ruby-lang.org> wrote:
> Hi,
>
> In message "Re: Unicode roadmap?"
>     on Mon, 19 Jun 2006 21:39:33 +0900, "Michal Suchanek" <hramrach / centrum.cz> writes:
>
> |> a), unless either of strings is "ascii" and the other is "ascii"
> |> compatible.  This point is arguable.
> |
> |What is "ascii"? Specifically I would like string operations to suceed
> |in cases when both strings are encoded as different subset of Unicode
> |(or anything else). ie concatenating an ISO-8859-2 and an ISO-8859-1
> |string sould result in UTF-* string, not an error.
>
> Every encoding has an attribute named ascii_compat.  EUC_JP, SJIS,
> ISO-8859-* and UTF-8 are declared ascii compatible, where EBCDIC,
> UTF-16 and UTF-32 are not.  No other auto conversion shall be done,
> since we don't particularly encourage mixed encoding model.
>

I wonder. Why cannot Strings throughout Ruby be _always_ represented
as Unicode and why no let ICU handle the conversion between various
encodings for incoming and outgoing data?
(http://www.ibm.com/software/globalization/icu/). I know, it is a
long-stanbding issue on Unicode's Han unification process, but without
proper Unicode support Ruby is destined to be a toy for
English-speaking and Japanese communities only. (And as I'm gearing up
to prepare a web-site in Russian, Turkish and English, I feel that
using Ruby could prove to be a major pain in the nether regions of my
body :) )