On 5/15/07, Brian Candler <B.Candler / pobox.com> wrote:

> P.S. I'm aware of Symbol#to_i, but to_i and object_id appear to be
> intimately related:
>
> irb(main):001:0> :foo.to_i
> => 14817
> irb(main):002:0> :foo.object_id
> => 148178
> irb(main):003:0> :bar.to_i
> => 16081
> irb(main):004:0> :bar.object_id
> => 160818
> irb(main):005:0> :zzzzzzzzzzzzzzzz.to_i
> => 16089
> irb(main):006:0> :zzzzzzzzzzzzzzzz.object_id
> => 160898

Here's part of the ruby1.8.5 code which computes an objects object_id
from its reference value.

if (TYPE(obj) == T_SYMBOL) {
        return (SYM2ID(obj) * sizeof(RVALUE) + (4 << 2)) | FIXNUM_FLAG;
    }

where SYM2ID is a c macro which shifts the value right 8 bits.

And here's the code for Symbol#to_i
static VALUE
sym_to_i(sym)
    VALUE sym;
{
    ID id = SYM2ID(sym);

    return LONG2FIX(id);
}


> i.e. I don't think the symbol table maintains an explicit integer key for
> each symbol.

Actually it does, based on having recently read the ruby 1.8.5 code.

It keeps two internal hashes, one maps the string representation to
the integer representation, and the other maps the other way around.

The code for String#to_sym basically does this:

    it calls rb_intern to get the integer representation called id, and returns
    ID2SYM(id) which just returns id shifted left 8 bits, in other
words it's the inverse of SYM2ID.

   rb_intern searches for the string in the symbol table and returns
the id found there if it finds it.

  otherwise, it calculates the integer representation by shifting the
next available id left by 3 bits and oring in some flag bits which
depend on the contents of the string, for example if the string starts
with a single "@"  it's flagged as an instance variable name,

It then makes a copy of the string and does the equivalent of
    sym_table[stringcopy] = newly_computed_id
    sym_rev_table[newly_computed_id] = stringcopy

Although these two aren't ruby hash objects but c hash tables.

FWIW, Ruby hash object use the same c hash code internally.

What's interesting is that a reference to a symbol doesn't actually
point to an allocated object.

-- 
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/