Hi,

Feature #695 was closed & marked done, but unfortunately it does not seem  
to have been implemented :-(

The request was:

> When combining 2 strings, with one being ASCII-8BIT, and the other is  
> encoding "E":
> 1) If the ASCII-8BIT string is valid if forced to encoding E, then treat  
> the ASCII-8BIT string as being in encoding E;
> 2) Otherwise treat both strings as ASCII-8BIT.
>
> Part (2) is less important, and can probably be omitted if it is hard to  
> implement.

However:

ruby -Kn -ve 'p "abc\xD8\xB5" + "abc\u0635"'
ruby 1.9.0 (2008-10-30 revision 20062) [i686-linux]
-e:1:in `<main>': incompatible character encodings: ASCII-8BIT and UTF-8  
(Encoding::CompatibilityError)

(The -Kn is only necessary here because with -e ruby uses the locale to  
determine the encoding of the string containing "\x".)
I thought this feature was implemented very quickly!

What appears to have been implemented is the encoding of "Array#pack"  
output with "U".
However, I am not totally convinced that even this was done correctly, as  
the pack output seems now to be marked UTF-8 even if the pack option  
contains a mixture of "U" with other options which then can result in an  
invalid UTF-8 string.

My feature request would mean that "pack" and "\x" string literals could  
be left as ASCII-8BIT, and be "forced" to another encoding transparently  
depending on how the programmer uses it.

You can liken this feature to the transparent conversion of an integer to  
a float when doing arithmetic.

If you agree that this is a good idea, I don't mind trying to produce a  
patch for it myself. Please let me know.

Cheers
Mike

On Wed, 29 Oct 2008 14:53:15 +1100, Michael Selig <redmine / ruby-lang.org>  
wrote:

> Feature #695: More flexibility when combining ASCII-8BIT strings with  
> other encodings
> http://redmine.ruby-lang.org/issues/show/695
>
> Author: Michael Selig
> Status: Open, Priority: Normal
> Category: M17N
>
> Consider the following 3 Ruby statements:
>
> # String#pack always returns ASCII-8BIT
> s1 = [97, 98, 99, 1589].pack("U*")
>
> # \xNN returns the source encoding (even if it is an invalid string), or  
> ASCII-8BIT if not set
> s2 = "abc\xD8\xB5"
>
> # \uNNNN always returns UTF-8
> s3 = "abc\u0635"
>
> All of s1, s2, and s3 have the same contents, but different encodings.  
> When you try to combine them, you get different "encoding compatibility"  
> problems, which can change depending on the source encoding, due to the  
> treatment of s2.
>
> I would like to see Ruby be able to combine all the above without error.  
> I don't think it is reasonable to have to use "force_encoding" in these  
> cases. This would
> - give better compatibility with 1.8,
> - make handling of methods returning ASCII-8BIT strings much easier (eg  
> String#pack and libraries which return strings in ASCII-8BIT because the  
> encoding is unknown)
> - reduce the confusion caused with "\x" producing a string which depends  
> on the source encoding (which I dislike - I think it should always  
> return ASCII-8BIT).
>
> So the feature request is:
>
> When combining 2 strings, with one being ASCII-8BIT, and the other is  
> encoding "E":
> 1) If the ASCII-8BIT string is valid if forced to encoding E, then treat  
> the ASCII-8BIT string as being in encoding E;
> 2) Otherwise treat both strings as ASCII-8BIT.
>
> Part (2) is less important, and can probably be omitted if it is hard to  
> implement.
>
> Thank you
> Michael Selig
>
>
> ----------------------------------------
> http://redmine.ruby-lang.org