On Feb 5, 2006, at 5:05 AM, Mauricio Fernandez wrote:
> On Sun, Feb 05, 2006 at 08:33:40PM +0900, Christian Neukirchen wrote:
>> Caleb Clausen <vikkous / gmail.com> writes:
>>> 100_000.times{|n|
>>>   o=Object.new;
>>>   i=o.__id__;
>>>   o2=ObjectSpace._id2ref(i);
>>>   o.equal? o2 or raise "o=#{o}, i=#{"%x"%i}, o2=#{o2.inspect}, n=# 
>>> {n}"
>>> }
>>
>> I can reproduce on ruby 1.8.4 (2005-12-24) [powerpc-darwin7.9.0]:
>>
>> o=#<Object:0x1d421c>, i=ea10e, o2=:reject, n=448 (RuntimeError)
>>
>> It looks like the object id wrapped in some way and now points to a
>> symbol?  Clearly looks like a bug.
>
> 0x1d421c.to_s(2)                                   # =>  
> "111010100001000011100"
> 0xea10e.to_s(2)                                    # =>  
> "11101010000100001110"
> 0xea10e.class                                      # => Fixnum
> (2 * 0xea10e).to_s(2)                              # =>  
> "111010100001000011100"
>
> So far so good.
>
> Now, in gc.c:
>
>     p0 = ptr = NUM2ULONG(id);
>     if (ptr == Qtrue) return Qtrue;
>     if (ptr == Qfalse) return Qfalse;
>     if (ptr == Qnil) return Qnil;
>     if (FIXNUM_P(ptr)) return (VALUE)ptr;
>     if (SYMBOL_P(ptr) && rb_id2name(SYM2ID((VALUE)ptr)) != 0) {
> 	return (VALUE)ptr;
>     }
>
> (SYMBOL_FLAG == 0x0e)
>
> NUM2ULONG is rb_num2ulong, which calls rb_num2long, which uses  
> FIX2LONG.
> id was 111010100001000011101b and ptr becomes  
> 11101010000100001110b, which
> matches the SYMBOL_FLAG.
>
> I'd conjecture that the above works on Linux because glibc's malloc 
> () always
> returns 8-byte aligned memory addresses, which doesn't seem to be  
> the case in
> OSX:
>
>  0x1d421c % 8                                      # => 4

OS X's malloc aligns memory on 16 byte boundaries.  This problem is  
not unique to OS X, you just need enough symbols.

> Another possibility would be that the address space for the data  
> segment
> used in OSX is lower than on Linux, so the SYM2ID matches an existent
> symbol:
>
> RUBY_PLATFORM                                      # => "i686-linux"
> Object.new.inspect                                 # => "#<Object: 
> 0xb7d44d7c>"
> 0xb7d44d7c >> 9                                    # => 6023718
> # we shouldn't have 6 million symbols
> 0x1d421c >> 9                                      # => 3745
> # but 4000 are indeed possible

If you're close enough to the beginning of memory ObjectSpace#_id2ref  
will pick a Symbol over the real object like you mention above:

$ cat symbol_object_overlap.rb
N = 100_000
Objs = Array.new N
Syms = Array.new 200
STR = 'new_symbol_base'

def symbol_info
   syms = Symbol.all_symbols.sort_by { |s| s.object_id }
   min = syms.first
   max = syms.last
   puts "found #{syms.length} symbols"
   puts "first symbol id: 0x%x (%p) last symbol id: 0x%x (%p)" %
     [min.object_id, min, max.object_id, max]
end

def make_objs
   N.times { |n| Objs[n] = Object.new }

   puts "Made #{N} objects."
   puts "Object ruby heap use:"
   puts "start object_id <--> end object_id (range)"
   first = Objs[0]
   last = Objs[0]
   Objs.each do |o|
     if o.object_id > last.object_id  then
       fid = first.object_id
       lid = last.object_id
       puts "0x%x <--> 0x%x (%d)" % [lid, fid, fid - lid]
       first = o
       last = o
     else
       last = o
     end
   end
end

def make_more_syms
   N.times do
     STR.intern
     STR.succ!
   end
   puts "Created #{N} new symbols"
end

def count_symbols
   count = 0
   Objs.each do |o|
     if Symbol === ObjectSpace._id2ref(o.object_id) then
       Syms[count] = o
       count += 1
     end
   end
   puts "Found #{count} symbols overlapping real objects in #{N}  
objects lookups"
end

symbol_info

make_objs

count_symbols

make_more_syms

count_symbols

symbol_info

#Syms.each do |s|
#  puts "0x%x: %p ==> %p" % [s.object_id, s, ObjectSpace._id2ref 
(s.object_id)]
#end

On OS X, malloc starts allocating memory from a very low address, so  
even the built-in symbols for a small program will overlap valid  
object addresses:

$ uname -a
Darwin kaa.local 8.5.0 Darwin Kernel Version 8.5.0: Sun Jan 22  
10:38:46 PST 2006; root:xnu-792.6.61.obj~1/RELEASE_PPC Power  
Macintosh powerpc
$ ruby -v symbol_object_overlap.rb
ruby 1.8.4 (2005-12-24) [powerpc-darwin8.4.0]
found 940 symbols
first symbol id: 0x210e (:"!") last symbol id: 0x27510e (:count_symbols)
Made 100000 objects.
Object ruby heap use:
start object_id <--> end object_id (range)
0xd7800 <--> 0xe47d0 (53200)
0xe4820 <--> 0x1df716 (1027830)
0x282800 <--> 0x2d1996 (323990)
Found 41 symbols overlapping real objects in 100000 objects lookups
Created 100000 new symbols
Found 189 symbols overlapping real objects in 100000 objects lookups
found 101030 symbols
first symbol id: 0x210e (:"!") last symbol id: 0xc5c510e  
(:new_symbol_gsqh)

FreeBSD starts returning memory from a much higher memory address so  
symbol overlaps take much longer to occur:

$ uname -a
FreeBSD sandbox.robotcoop.com 4.10-RELEASE FreeBSD 4.10-RELEASE #0:  
Wed Feb 23 15:47:08 CST 2005     root@fbsdbootload:/usr/obj/usr/src/ 
sys/theplanet  i386
$ ruby -v symbol_object_overlap.rb
ruby 1.8.4 (2005-12-24) [i386-freebsd4]
found 931 symbols
first symbol id: 0x210e (:"!") last symbol id: 0x27090e (:count_symbols)
Made 100000 objects.
Object ruby heap use:
start object_id <--> end object_id (range)
0x4039000 <--> 0x404611a (53530)
0x4046142 <--> 0x40c3f16 (515540)
0x40c4000 <--> 0x4113196 (323990)
Found 0 symbols overlapping real objects in 100000 objects lookups
Created 100000 new symbols
Found 196 symbols overlapping real objects in 100000 objects lookups
found 101028 symbols
first symbol id: 0x210e (:"!") last symbol id: 0xc5c090e  
(:new_symbol_gsqh)

-- 
Eric Hodel - drbrain / segment7.net - http://segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com