On Fri, 12 Dec 2008 05:34:38 +1100, Dave Thomas <dave / pragprog.com> wrote:

> Right now, we have the strange situation that
>
>    "cat".to_sym.to_s.encoding != "cat".encoding

Yes, this seems to be an inconsistency, though in practice I don't think  
it causes any problems.
I seem to recall that a few months ago the parser "optimized" strings to  
US-ASCII when the src encoding was UTF-8 (or any other ascii-compatible  
encoding), but this behaviour changed at some point. Perhaps this  
inconsistency is a remnant of that?

I would also like to point out a couple of other inconsistencies with  
symbols:

1) "p" seems to do the wrong thing with symbols encodings, yet inspect is  
OK:

As a string:
p "\u0639abc" => "عabc"
p "\u0639abc".force_encoding("BINARY") => "\xD8\xB9abc"

As a symbol:
p "\u0639abc".to_sym => :عabc
p "\u0639abc".force_encoding("BINARY").to_sym => :عabc

Using inspect:
"\u0639abc".to_sym.inspect => ":عabc"
"\u0639abc".force_encoding("BINARY").to_sym.inspect => ":\xD8\xB9abc"

The annoying thing about this is that when you use "p" 2 symbols with  
different encodings can look the same, but are actually different ids.


2) Symbol#== rdoc says "If sym and obj are exactly the same symbol,  
returns true. Otherwise, compares them as strings."
I don't think this is right:
p :cat == "cat" => false

It works like this in 1.8 also. I think this is just a documentation  
error, and the "Otherwise, compares them as strings" should be dropped.

Cheers
Mike