Paul Brannan wrote:
> If what Matz says is true, that it's 8-10% slower than the current
> implementation, isn't that enough of a yellow flag?
> 
> Is it possible to improve performance of the patch?  I looked over the
> implementation for a little while, but didn't see any obvious
> opportunities for improvement.  I'd be interested to see profiler output
> to see where time is being spent.

I've looked for ways to optimize it, but the bitfield implementation 
already seems to be near-optimal. All the old garbage collector had to 
do was setting a flag. But this one has to:
1. Call a mark function.
2. Find the heap that the object is on. This operation is O(n), with n = 
the number of heaps. Though it uses a cache to speed up the average case.
3. Calculate the location in the bitfield. This requires a division (/) 
and a remainder (%) operation.

(1), function call overhead, could be solved with aggressive inlining 
and aggressive use of macros. The downside is excessive use of macros 
would make the code very hard to read.

(2) could be solved as follows: put an index field in each VALUE object. 
Then the mark function can find the location of the heap (and thus the 
bitfield) in O(1) time. But this will make each VALUE object at least 4 
bytes bigger (on x86). Is this increase in memory usage acceptable for 
8-10% performance gain?
I've also thought about ways to make this operation O(1) without an 
index field. *Maybe* it is possible by ensuring that heap addresses are 
aligned on some multiple of an integer constant. Then finding the heap 
from an object would only involve the extraction of a few bits in the 
object address. But this is highly platform-dependent and I'm not sure 
whether it is viable to implement this without making the code 
unmaintainable and unportable.

(3): I don't think this can be solved.

In any case, all of these operations need at least several pointer 
dereferences. The old GC only needed 1.

And because Windows does not support fork(), the garbage collector 
doesn't have to use a bit field for marking when compiled on Windows. 
This could be configured with a macro.