On Oct 23, 2011, at 12:56 PM, Steve Klabnik wrote: >> There really is a problem if I take a string encoded with UTF-8 and try to >> concatenate it with a string encoded with 8859-1 (or one of the more exotic >> character sets). What I have never understood (and the Ruby people have >> tried to educate me) is why, when I say "utf-8-string" + "8859-1-string", >> Ruby can't just convert the latter to the encoding of the first, do the >> concatenation and be done with it. > > Well, you could make this argument for anything. Why does 2 + "hey" > not call to_s on the 2 automatically/ After all, you can convert 2 to > "2" losslessly... there's a reason things are strongly typed. Good example. I guess I see converting strings as slightly different. The encoding I don't see the same as a type. > Also, I don't know the specifics of 8859-1, but there are some > encodings that are just simply not compatible with each other. Try > adding a UTF-8 string to an ASCII string, for example... I believe your example is not right but the general case is true. If string A is encoded in EnA and string B is encoded in EnB, then it can happen that A can not be re-encoded in EnB nor B re-encoded in EnA. But as far as I know, Unicode claims to be able to encode everything and UTF-8 is just a more compact version of Unicode. I believe (perhaps mistakenly) that everything can be re-encoded to Unicode (and thus encoded to UTF-8). Coding everything in Unicode is how a lot of other languages deal with this problem. pedz