Hi,

danielcavanagh / aanet.com.au wrote:

I don't mean to shoot you down in flames, but a lot of thought and effort 
has gone into Ruby's encoding support. Ruby could have followed the Python 
route of converting everything to Unicode, but that was rejected for various 
good reasons. Also automatic transcoding to solve issues of incompatible 
encodings was also rejected because it causes a number of problems, in 
particular I believe that transcoding isn't necessarilly accurate, because 
for example there may be multiple or ambiguous representations of the same 
character.

What *was* introduced is the concept of a "default_internal" encoding, 
which, if used by the programmer, causes I/O and other interfaces to 
transcode to the internal encdoing on input & the opposite on output. 
Typically the default_internal encoding, if used, is UTF-8, and in this case 
the programmer would have to accept that, on doing I/O to a file in a 
different encoding, the transcoding *may* cause data loss.


> we first add a function
> to do actual conversions between two encodings based on character, not
> just reinterpreting the byte values. so c in latin-1 (0x63) would become c
> in utf-32 (0x00000063).

String#encode does this I believe

> it could have lists of which encodings are
> supersets of other encodings

Unfortunately it turns out that the only encoding that we can reliably state 
is a subset of any other encoding is US-ASCII, and Ruby knows about this and 
optimizes for it.

Cheers
Mike