Issue #15408 has been updated by headius (Charles Nutter).


> object_id does precisely what I expect.

Then your expectations do not include that it's actually an ID, since it's literally just a pointer into the heap. In the short term, that pointer will likely be occupied by other objects. Longer term for Ruby that pointer value will actually change as the heap gets compacted or objects are moved to other generations.

The problems of object_id could possibly be solved with a rename or better documentation. I think it needs to either be:

* A strictly-monotonically increasing value. As you pointer out, it would be difficult with current Ruby implementations on current hardware to blow out the 62-bit limit. However that integer needs to be atomically updated for every object that needs an ID. It would still be best to do this lazily only for objects where it's needed. This is exactly the JRuby implementation right now (though we don't lose those two bits).

OR

* A pseudo-random hash value never guaranteed to be unique but guaranteed to have a reasonable hash distribution. This is the JVM's `identityHashCode` which JRuby uses for the base `hash` value for all objects. MRI currently uses `object_id` both as a base hash and as an unreliable pseudo-ID.

If `_id2ref` goes away and `object_id` becomes one of the above, that's likely acceptable. I don't like changing how `object_id` works in such a drastic way without naming it something more appropriate, though.

> If so, let's fix the documentation of object_id.

Or we fix object_id to actually be an ID. Or we get rid of it and replace it with something more like a base hash. Both are better options than leaving it in place, since it's not an ID, it's not idempotent, and it doesn't do what most people expect.

>> They eventually lead to difficult-to-diagnose bugs in every possible use case.
> How?

Nearly all uses of `object_id` I have seen treat it as a reliable alias for the object itself. All such code is broken. Exceptions include object_id used solely for base object hash calculation or inspect output, neither of which really require uniqueness.

> Sources of the embedded ruby portion of a robust C++ desktop application continuously developed for 15+ years

This proves nothing without knowing how `object_id` is being used. What are you using those `object_id`s for? Show us please.

> object_id has never been broken. No need to tar it with id2ref's failings.

One of the primary use cases of `object_id` is pairing it with `_id2ref`. As I've said a couple times, `_id2ref` most definitely needs to go away. Once it does, we have to decide what `object_id` is really supposed to be, because it can't be what it is now and be safely usable for more than logging or debugging.

> Sorry, subtract the two bits needed to distinguish Fixnum from other objects. Still: 146 years?

Assuming 64-bit systems, you're right, it would take a long time with current Ruby implementations. That's why we thought it acceptable to implement it this way in JRuby many years ago, since JRuby has always had 64-bit Fixnums.

> On the other hand, if object_id is just a form of the pointer...

Hopefully it's clear by now that it can't just be a form of the pointer, since the pointers are reused today and will be reused even more in the future.

----------------------------------------
Feature #15408: Deprecate object_id and _id2ref
https://bugs.ruby-lang.org/issues/15408#change-75647

* Author: headius (Charles Nutter)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
Ruby currently provides the object_id method to get a "identifier" for a given object. According to the documentation, this ID is the same for every object_id call against a given object, and guaranteed not to be the same as any other active (i.e. alive) object. However, no guarantee is made about the ID being reused for a future object after the original has been garbage collected.

As a result, object_id can't be used to uniquely identify any object that might be garbage collected, since that ID may be associated with a completely different object in the future.

Ruby also provides a method to go from an object_id to the object reference itself: ObjectSpace._id2ref. This method has been in Ruby for decades and is often used to implement a weak hashmap from ID to reference, since holding the ID will not keep the object alive. However due to the problems with object_id not actually being unique, it's possible for _id2ref to return a different object than originally had that ID as object slots are reused in the heap.

The only way to implement object_id safely (with idempotency guarantees) would be to assign to all objects a monotonically-increasing ID. Alternatively, this ID could be assigned lazily only for those objects on which the code calls object_id. JRuby implements object_id in this way currently.

The only way to implement _id2ref safely would be to have a mapping in memory from those monotonically-increasing IDs to the actual objects. This would have to be a weak mapping to prevent the objects from being garbage collected. JRuby currently only supports _id2ref via a flag, since the additional overhead of weakly tracking every requested object_id is extremely high. An alternative for MRI would be to implement _id2ref as a heap scan, as it is implemented in Rubinius. This would make it entirely unpractical due to the cost of scanning the heap for every ID lookup.

I propose that both methods should immediately be deprecated for removal in Ruby 3.0.

* They do not do what people expect.
* They cannot reliably do what they claim to do.
* They eventually lead to difficult-to-diagnose bugs in every possible use case.

Put simply, both methods have always been broken in MRI and making them unbroken would render them useless.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>