On 28.10.2007 17:19, Charles Oliver Nutter wrote:
> Robert Klemme wrote:
>> IMHO ObjectSpace should not be implemented in Java land.  Why?  The 
>> JVM has to keep track of instances anyway and implementing this in 
>> Java via WeakReferences seems to duplicate functionality that is 
>> already there. Did you consider using "Java Virtual Machine Tools 
>> Interface"?
>>
>> http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/gbmmt.html#gbmls 
>>
>>
>> You could either follow the same approach of the heapTracker presented 
>> on that page and use a flag or require a lib that enables ObjectSpace 
>> (because of the overhead of instrumentation).
> 
> You just hit on exactly why we don't use JVMTI for ObjectSpace. It would 
> certainly work, but it would add a lot of overhead we'd never expect 
> people to accept in a real application. Plus, it would track far more 
> object instances than we actually want tracked.

Why is that?  I mean, you could selectively decide which instances to track.

> We'd love to include a 
> JVMTI-based ObjectSpace implementation, however...it just hasn't been a 
> high priority to implement since 99% of users never actually need 
> ObjectSpace.
> 
>> Alternatively there may be another method that does not need 
>> instrumentation and that can give you access to every (reachable) 
>> object in the JVM.
> 
> If there is...we haven't found it. The "linked weakref list" has been 
> the least overhead so far, and it's still a lot of overhead.

Hmm, but there are iteration methods like #each_object:
http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.html#Heap

Did you put them down because of the "stop the world" approach?  I'd say 
that would be ok - at least it's better than not having ObjectSpace. 
And also, there would be no overhead.  Question is only whether it's ok 
to invoke arbitrary byte code (which would happen during the iteration 
callback).

>>> Your idea has come up in the past, and it would probably eliminate 
>>> the cost of an ObjectSpace list. However that doesn't appear to be 
>>> where we pay the highest cost.
>>>
>>> The two items that (we believe) cost the most for us on the JVM are:
>>>
>>> - Constructing an extra object for every Ruby object...namely, the 
>>> WeakReference object to point to it. So we pay a 
>>> memory/allocation/initialization cost.
>>> - WeakReference itself causes Java's GC to have to do additional 
>>> checks, so it can notify the WeakReference that the object it points 
>>> at has gone away. So that slows down the legendary HotSpot GC and we 
>>> pay again.
>>>
>>> I believe the parent -> weakref -> children algorithm is used in some 
>>> implementations of ObjectSpace-like behavior, so it's perfectly 
>>> valid. But again, there's certain aspects of ObjectSpace that are 
>>> just problematic...
>>>
>>> - threading or concurrency of any kind? No, you can't have 
>>> multithreading with ObjectSpace, nor a concurrent/parallel GC (and it 
>>> potentially excludes other advanced GC designs too).
>>> - determinism? Matz told me that "ObjectSpace doesn't have to be 
>>> deterministic"...but when it starts getting wired into libraries like 
>>> test/unit, it seems like people expect it to be. If we can say OS 
>>> isn't deterministic, then *nobody* should be relying in its contents 
>>> for core libraries, and we could reasonably claim that each_object 
>>> will never return *anything*.
>>
>> I'd reformulate the requirement here: ObjectSpace.each_object must 
>> yield every object that was existent before the invocation and that is 
>> strongly reachable.  I believe for the typical use case (e.g. 
>> traversing all class instances) this is enough while leaving enough 
>> flexibility for the implementation (i.e. create s snapshot of some 
>> form, iterate through some internal structure that may change due to 
>> new objects being created during #each_object etc.).
> 
> The problem here is "strongly reachable". During ObjectSpace processing, 
> the last strong reference to an object may go away and the garbage 
> collector may run. Should ObjectSpace prevent GC from running if it's 
> traversed and now references that object? If not, how should it be 
> handled if immediately before you return an object from each_object, it 
> gets garbage collected?

You are right: objects can "disappear" (i.e. loose their strong 
reachability) during traversal.  Obviously my suggested requirement was 
still too strong.

> There's no way to catch that, so each_object may 
> end up returning a reference to an object that's gone away, or 
> reconstituting an object whose finalization has already fired. Bad 
> things happen.

Recreation is a bad idea.  I agree, objects that are no longer strongly 
reachable at the moment they are about to be passed to the block should 
*not* be passed.

> ObjectSpace is just not compatible with any GC that requires the ability 
> to move objects around in memory,

I don't think that moving is an issue.  If it were, JVM's would not work 
the way they do (object references are no pointers to memory locations). 
  In other words, all programs would have the same problems #each_object 
had.

> run in parallel, and so on. It can 
> *never* be deterministic unless it can "stop the world", so it should 
> not be used for algorithms that require any level of determinism, such 
> as the test search in test/unit.

Right you are.  #each_object should not be used in regular code - it's 
more for ad hoc statistics ("how many instances of a class?") and the like.

Kind regards

	robert