On 6/19/06, Yukihiro Matsumoto <matz / ruby-lang.org> wrote: > Hi, > > In message "Re: Unicode roadmap?" > on Mon, 19 Jun 2006 21:39:33 +0900, "Michal Suchanek" <hramrach / centrum.cz> writes: > > |> a), unless either of strings is "ascii" and the other is "ascii" > |> compatible. This point is arguable. > | > |What is "ascii"? Specifically I would like string operations to suceed > |in cases when both strings are encoded as different subset of Unicode > |(or anything else). ie concatenating an ISO-8859-2 and an ISO-8859-1 > |string sould result in UTF-* string, not an error. > > Every encoding has an attribute named ascii_compat. EUC_JP, SJIS, > ISO-8859-* and UTF-8 are declared ascii compatible, where EBCDIC, > UTF-16 and UTF-32 are not. No other auto conversion shall be done, > since we don't particularly encourage mixed encoding model. > I wonder. Why cannot Strings throughout Ruby be _always_ represented as Unicode and why no let ICU handle the conversion between various encodings for incoming and outgoing data? (http://www.ibm.com/software/globalization/icu/). I know, it is a long-stanbding issue on Unicode's Han unification process, but without proper Unicode support Ruby is destined to be a toy for English-speaking and Japanese communities only. (And as I'm gearing up to prepare a web-site in Russian, Turkish and English, I feel that using Ruby could prove to be a major pain in the nether regions of my body :) )