Feature #695: More flexibility when combining ASCII-8BIT strings with other encodings
http://redmine.ruby-lang.org/issues/show/695

Author: Michael Selig
Status: Open, Priority: Normal
Category: M17N

Consider the following 3 Ruby statements:

# String#pack always returns ASCII-8BIT
s1 = [97, 98, 99, 1589].pack("U*")

# \xNN returns the source encoding (even if it is an invalid string), or ASCII-8BIT if not set
s2 = "abc\xD8\xB5"

# \uNNNN always returns UTF-8
s3 = "abc\u0635"

All of s1, s2, and s3 have the same contents, but different encodings. When you try to combine them, you get different "encoding compatibility" problems, which can change depending on the source encoding, due to the treatment of s2.

I would like to see Ruby be able to combine all the above without error. I don't think it is reasonable to have to use "force_encoding" in these cases. This would
- give better compatibility with 1.8,
- make handling of methods returning ASCII-8BIT strings much easier (eg String#pack and libraries which return strings in ASCII-8BIT because the encoding is unknown)
- reduce the confusion caused with "\x" producing a string which depends on the source encoding (which I dislike - I think it should always return ASCII-8BIT).

So the feature request is:

When combining 2 strings, with one being ASCII-8BIT, and the other is encoding "E":
1) If the ASCII-8BIT string is valid if forced to encoding E, then treat the ASCII-8BIT string as being in encoding E;
2) Otherwise treat both strings as ASCII-8BIT.

Part (2) is less important, and can probably be omitted if it is hard to implement.

Thank you
Michael Selig


----------------------------------------
http://redmine.ruby-lang.org