Issue #15408 has been updated by headius (Charles Nutter).


I'm glad most of us are in agreement about _id2ref!

> object_id ... is idempotent,

Yeah you're right here...I realize the lack of idempotency applies to using it in _id2ref, since it will eventually return a different result over time, but a given object currently does maintain a consistent object_id.

>> and no two active objects will share an id.

> That's the same as System.identityHashCode().

System.identityHashCode makes no uniqueness guarantees at all. It's absolutely possible for two objects to have the same identityHashCode, especially because it's only a 32-bit signed integer.

object_id guarantees uniqueness against other currently-alive objects, since it's the pointer to each object.

I mentioned above, I'd be mostly satisfied if object_id were reduced to an identity hash code OR if it were generated and guaranteed unique to the lifetime run of the process. It's somewhere in the middle right now and that's where the problems come from.

> I wonder how _id2ref works in MRI if objects are moved by the GC

ko1 and tenderlove know current status of this, but up until recently no objects were moved in MRI ever. Now with generational GC and compaction, they'll absolutely be moved around, so object_id *must* change, regardless of how much people love it the way it is. The two options I've spelled out are reasonable alternatives.

At this point, my main concern is having it be called anything like "ID" without uniqueness.

If it remains "object_id" I think it needs to use the monotonically-increasing value.

If it will be reduced to an identity hashcode, it should not be named "object_id". "identity_hash" (mirrors its use in Hash) or something similar would be better/more accurate/more descriptive.

----------------------------------------
Feature #15408: Deprecate object_id and _id2ref
https://bugs.ruby-lang.org/issues/15408#change-75666

* Author: headius (Charles Nutter)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
Ruby currently provides the object_id method to get a "identifier" for a given object. According to the documentation, this ID is the same for every object_id call against a given object, and guaranteed not to be the same as any other active (i.e. alive) object. However, no guarantee is made about the ID being reused for a future object after the original has been garbage collected.

As a result, object_id can't be used to uniquely identify any object that might be garbage collected, since that ID may be associated with a completely different object in the future.

Ruby also provides a method to go from an object_id to the object reference itself: ObjectSpace._id2ref. This method has been in Ruby for decades and is often used to implement a weak hashmap from ID to reference, since holding the ID will not keep the object alive. However due to the problems with object_id not actually being unique, it's possible for _id2ref to return a different object than originally had that ID as object slots are reused in the heap.

The only way to implement object_id safely (with idempotency guarantees) would be to assign to all objects a monotonically-increasing ID. Alternatively, this ID could be assigned lazily only for those objects on which the code calls object_id. JRuby implements object_id in this way currently.

The only way to implement _id2ref safely would be to have a mapping in memory from those monotonically-increasing IDs to the actual objects. This would have to be a weak mapping to prevent the objects from being garbage collected. JRuby currently only supports _id2ref via a flag, since the additional overhead of weakly tracking every requested object_id is extremely high. An alternative for MRI would be to implement _id2ref as a heap scan, as it is implemented in Rubinius. This would make it entirely unpractical due to the cost of scanning the heap for every ID lookup.

I propose that both methods should immediately be deprecated for removal in Ruby 3.0.

* They do not do what people expect.
* They cannot reliably do what they claim to do.
* They eventually lead to difficult-to-diagnose bugs in every possible use case.

Put simply, both methods have always been broken in MRI and making them unbroken would render them useless.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>