> There are many research papers on this topic, you can probably find some by
> googling.

Good call.
Here are some insights gleaned from a few papers.

You possibly could sweep in a different thread too--maybe if you threw
up a write barrier during sweep, so that it could still be
conservative [?]

You could also use several threads to speed up the sweep phase--each
doing different segments.

You could do "incremental sweeps" instead of doing all sweeps at once
in one huge mark phase [i.e. single threaded, and when you run out of
freelist, you sweep another few blocks, etc.].

Another thought would be that whenever you hit a
rb_thread_blocking_region you do these incremental sweeps [in a
separate thread] until the rb_thread_blocking_region returns, or, as
the original author (Doug Beaver) put it "sweep for N milliseconds,or
until you free M KB of memory".

A compacting GC would be kind, too, though I'm not sure it would be
possible since you want to upload old pointers to data with their new
pointer equivalents--how could one tell which values on the stack are
old pointers to data and which are false positives?

Seems that the low hanging fruit would be multi-thread the sweep phase
with a another thread [assuming multi-core with cores going unused,
which I think is a common case with VPS's with little RAM, though
maybe it's not?].

Thanks.

=r