On Tue, Jan 12, 2010 at 5:23 AM, Brent Roman <brent / mbari.org> wrote:
>
> This paper and excellent slide set seem very promising:
>
> http://matsu-www.is.titech.ac.jp/~endo/papers/endo-ismm2002-gc.pdf
> http://matsu-www.is.titech.ac.jp/~endo/papers/endo-ismm2002-gc.ppt
>
> Toshio Endo and Kenjiro Taura adapted the Boehm conservative GC
> to *dramatically* reduce pauses with incremental collection.    
> about 10% to execution times on a single core CPU, but improves
> execution times over basic mark-and-sweep
> on multi-core CPU -- by allowing marking to run
> in parallel on multiple cores.
>
> This GC seems like a good fit to me.     > use multi-core processors, it should be an all around performance "win".
>
> Has anyone looked into grafting this GC into MRI?
>
> Does anyone see any conceptual problems in doing so?

Well, I've just scanned the paper, but I have a few comments.

1) The algorithm they present relies on hooking into the virtual
memory implementation in order to implement a write barrier by write
protecting pages, catching the exception caused when a store is done
into a protected page, marking the page as dirty, and then un-write
protecting the page.  I'm not sure how portable this is across various
combinations of operating system and processor.

2) IMHO, conservative GCs are compromises which attempt to provide GC
to languages without a strong 'object model'. And without help from
the compiler. By object model here, I mean more than just how objects
are laid out, or how methods are found, but a design which is factored
so that at the least references to objects vs. non-objects can be
distinguished, and reference changes can be efficiently tracked,
usually with some support from the compiler, and/or implementation of
the 'byte-codes'.

There's no reason why Ruby couldn't be implemented with such an object
model, in fact I don't think that it's far from that.

There are decades of experience in implementing VMs for such languages
which provide efficient accurate garbage collection, while also
providing at least, and usually two, interfaces for invoking code not
generated by the languages compiler:

1) An interface for invoking procedural library code which is totally
unaware which language invoked it.  Such code takes a list of
parameters and returns result value(s). This code need have no
interaction with the GC, the interface insures that object references
don't pass across the interface, all the parameters get converted to
and result values get converted from basic data types (ints, strings,
structs ...).  This is what the FFI does.

2) An interface for using a low-level language for writing
'primitives' which are aware that they need to interact with the VM
and the garbage collector.  This requires that the interface provides
macros and/or function calls/call backs and a set of rules to be
followed in writing the primitives.

The biggest problem with MRI, is that it started out using one api for
both of these.  I suspect that a large percentage of Ruby extensions
are really cases of use case 1.

If Ruby were to take a path of deprecating using the extension api for
use case 1 in favor of FFI, and then evolving the extension API to
support having objects move during GC (which the similar APIs for
Smalltalk, Java and VM base languages have done), it could actually
get a competitive VM/GC.

-- 
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale