Issue #9111 has been updated by nobu (Nobuyoshi Nakada).


sawa (Tsuyoshi Sawada) wrote:
> I suggest that the comparison `String#<=>` should not be based on the respective encoding of the strings, but all the strings should be internally converted to UTF-8 for the purpose of comparison.

It's unacceptable to always convert all strings to UTF-8, should restrict to comparison with an ASCII-8BIT string.

----------------------------------------
Feature #9111: Encoding-free String comparison
https://bugs.ruby-lang.org/issues/9111#change-42935

Author: sawa (Tsuyoshi Sawada)
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: 


=begin
Currently, strings with the same content but with different encodings count as different strings. This causes strange behaviour as below (noted in StackOverflow question http://stackoverflow.com/questions/19977788/strange-behavior-in-packed-ruby-strings#19978206):

    [128].pack("C")             # => "\x80"
    [128].pack("C") == "\x80"   # => false

Since `[128].pack("C")` has the encoding ASCII-8BIT and `"\x80"` (by default) has the encoding UTF-8, the two strings are not equal.

Also, comparison of strings with different encodings may end up with a messy, unintended result.

I suggest that the comparison `String#<=>` should not be based on the respective encoding of the strings, but all the strings should be internally converted to UTF-8 for the purpose of comparison.


=end


-- 
http://bugs.ruby-lang.org/