On Oct 23, 2011, at 12:56 PM, Steve Klabnik wrote:

>> There really is a problem if I take a string encoded with UTF-8 and try to
>> concatenate it with a string encoded with 8859-1 (or one of the more exotic
>> character sets).  What I have never understood (and the Ruby people have
>> tried to educate me) is why, when I say "utf-8-string" + "8859-1-string",
>> Ruby can't just convert the latter to the encoding of the first, do the
>> concatenation and be done with it.
> 
> Well, you could make this argument for anything. Why does 2 + "hey"
> not call to_s on the 2 automatically/ After all, you can convert 2 to
> "2" losslessly... there's a reason things are strongly typed.

Good example.  I guess I see converting strings as slightly different.  The
encoding I don't see the same as a type.

> Also, I don't know the specifics of 8859-1, but there are some
> encodings that are just simply not compatible with each other. Try
> adding a UTF-8 string to an ASCII string, for example...

I believe your example is not right but the general case is true.  If string A is
encoded in EnA and string B is encoded in EnB, then it can happen that
A can not be re-encoded in EnB nor B re-encoded in EnA.

But as far as I know, Unicode claims to be able to encode everything and
UTF-8 is just a more compact version of Unicode.  I believe (perhaps mistakenly)
that everything can be re-encoded to Unicode (and thus encoded to UTF-8).  Coding
everything in Unicode is how a lot of other languages deal with this problem.

pedz