Here is a recent patch I've been experimenting with--for any advice. [1]

It runs the garbage collector in a forked child process--the child discovers
any garbage and reports back to the "main thread" which existing
objects are freeable
[or rather, which objects were freeable at the time the child was forked]
so the parent does little of the collection itself.  It's
like the parent gets a premonition hint--"here are all the freeable
objects!"

Currently is more of a prototype than anything.  Does seem to work
[either that or my tests are wrong--also a strong possibility].

on a Linux box:
time ./ruby -e 'a = []; 10_000_000.times { a << 3}; 100_0000.times { "b"}'

gc.c original:
real 0m6.993s
41MB RSS

gc.c original with HEAP_MIN_SIZE at 100000 instead of 10000
real 0m2.942s # interesting result
42MB RSS

threaded GC [also has HEAP_MIN_SIZE at 10000]
real 0m2.164s
43MB RSS

It's currently not copy on write friendly at all.  "Theoretically" it
allows the parent thread to continue processing while the GC is being
collected, and also can offload the GC collection to a totally
separate core.  Not quite a reality but theoretically possible.
Currently it doesn't uses 100% of two cores because it takes system
time to duplicate the memory as its first processed, slowing down both
slightly.

The thought for its creation came a few months go while reading the
Book of Mormon--the connection? Beats me.
Take care all.
-=R
[1] http://gist.github.com/18242 [also available
http://wilkboardonline.com/roger/gc.c since I can never get the
patches to work quite right]