On Mon, 15 Dec 2008 20:12:59 +1100, Yukihiro Matsumoto  
<matz / ruby-lang.org> wrote:

> You're right.  When we have two strings with identical byte sequence
> but different encodings, we have to tell they are different.  The
> comparison result does not matter much, so I used encoding index.
> Is there any alternative choice that makes sense?

It probably doesn't make sense to try to order 2 strings of incompatible  
encoding, so what you have done is probably is as good as anything else.  
The only real alernative is to raise an Encoding Compatibility error, but  
that is not a good idea either, I think, because I believe you would want
	s1 == s2
to return false rather than an error on incompatible encodings. So if you  
consider String#<=> as the "base" for all the string comparison methods  
(whether implemented that way or not), then to be consistent with "==" it  
would have to return a value for all possible encodings of s1 & s2,  
compatible or not, which implies that String#>, < etc must all return a  
value also.

By the way, I think I phrased my description of String method  
implementations badly. I meant to say that Strings are stored as the bytes  
of their representation in their encoding, not as an array of codepoints.  
There *are* some methods which must convert characters to codepoints for  
their implementation, but this happens "on the fly". Many common String  
methods (eg: concatenate) operate directly on the bytes without the need  
to convert to codepoints. I should also have pointed out that Ruby goes to  
a lot of trouble to optimize methods operating on single-byte character  
strings in order to keep their performance good.

Mike