Disclaimer: I haven't used 1.9 encoding stuff so far. Nevertheless my 0.02EUR:

2008/9/17 Michael Selig <michael.selig / fs.com.au>:
> <soapbox>
>
> Using Ruby SHOULD be making our lives easier, not harder. Other languages
> like Python have taken an easier route to m17n - represent all strings
> internally as unicode codepoints.

Which is also what Java does.  I have always found Java's approach to
encodings very clean and workable.  But if I remember correctly Matz
once said that Unicode does not cover all Asian symbols so it might
not be a too good choice for internal representation.

> Then there should never be a need to check
> encoding compatibility, right? I am not saying that this is a perfect
> solution either, by the way. But having to work around this "Encoding
> Compatibility Error" all the time is just a pain for apps which need to work
> in different countries with different locales. Unfortunately it is leading
> me towards the path of having to transcode everything to UTF-8, even though
> in 99% of cases all the data IS going to be compatible and be in the user's
> locale. I don't want so much of my time taken up, and be forced to write
> ugly code to take care of the remaining 1%. Maybe the problem is that Ruby
> is being too generous supporting all these different encodings internally!
> That was one reason why I raised the idea of removing UTF-16 & 32 support -
> at least that way I know that the ASCII strings from my program can work
> with any user data. But then the further problem: What if you need to work
> with (or at least take into account the possibility of) 2 or more non-ascii
> (but ascii compatible) encodings (eg: the user's locale & UTF-8)?
>
> What may solve this issue is if Ruby itself would automatically encode
> incompatible strings in a compatible encoding (UTF-8 I guess). The only time
> you should then get "Encoding Compatibility Errors" is when writing data to
> a file or network stream in a certain encoding and a character cannot be
> represented. That's it.
>
> Just a thought...
>
> </soapbox>

I believe that one reason for the difficulties we encounter now is the
fact that String is historically used for binary and text data.  So
there is no clear separation between the two and this bears potential
for confusion and bugs.

A clean solution would probably involve having a character type which
is capable of representing *all* possible symbols and model String as
sequence of those characters.  Encoding would then be done during
input and output only.  Questions I see

1. Is this feasible, i.e. is there something similar to Unicode
without its limitations?

2. Is it fast enough for the general case?

3. What happens to binary Strings? or more generally:

4. What happens to old (pre 1.9) code?

i18n is a nasty beast...

Kind regards

robert

-- 
use.inject do |as, often| as.you_can - without end