Hi.

I created Bitmap Marking GC for Ruby2.0.

Source code: https://github.com/authorNari/ruby/tree/bitmap_marking
Patch: https://github.com/authorNari/patch_bag/blob/master/ruby/gc_bitmap_using_alignment_r33786.patch

In following environment, this patch works 'make check' and
'make TESTS="--gc-stress" test-all'.

$ ruby -v
ruby 2.0.0dev (2011-11-18 trunk 33786) [x86_64-linux]

= Performance evaluation

== make benchmark
The result of make benchmark OPTS="-r 5" is here.
https://gist.github.com/1542547
In general, it's a little bit slower.

In Bitmap Marking GC, GC will need to find a bitmap for a object in a
mark process. So, GC will be a little bit slow.

== skkzipcode
Bitmap Marking GC is copy-on-write friendly as Ruby Enterprise
Edition does.
http://www.rubyenterpriseedition.com/faq.html

I measured a above improvement by skkzipcode which is a benchmark program.
In skkzipcode, the parent process keeps many data and child processes
uses data that is shared with the parent process.
https://github.com/authorNari/skkzipcode
(This program uses /proc/PID/smaps to profile memory usages)

origin
PROCESS_CNT : 5
SHARED_TOTAL: 59124     kb
PRIV_TOTAL  : 224892    kb

REE - GC.copy_on_write_friendly = true
PROCESS_CNT : 5
SHARED_TOTAL: 207720    kb
PRIV_TOTAL  : 164572    kb

bmap - Bitmap Marking GC for Ruby 2.0
PROCESS_CNT : 5
SHARED_TOTAL: 170744    kb
PRIV_TOTAL  : 138336    kb

* PROCESS_CNT: count of child processes
* SHARED_TOTAL: total of shared memory usage of child processes (KB)
* PRIV_TOTAL: total of private memory usage of child processes (KB)

bmap is copy-on-write friendly!!

= Implementation
Let me introduce some implementation topics.

* A heap block address is aligned by 16KB to find fast a bitmap.
  * In Linux, it uses posix_memalign() or memalign().
  * In Windows, it uses _aligned_malloc().
* To avoid unnecessary writing, GC decreases to relink freelist.
  * GC doesn't relink objects that are linked on freelist at starting GC.
* A heap slot has freelist.
* I embed a struct heaps_slot to a heap block.

This patch improves memory usage on programs that are using fork() in
Linux. We have to use fork() when we need a real parallel performance
in CRuby. And, we already have many libraries that are using fork().
(e.g. Unicorn, Resque).

And, GC is a little bit slower. But, I think it's in acceptable range.

I already posted this topic to ruby-dev.
http://bugs.ruby-lang.org/issues/5839
Matz agreed to commit this patch to trunk.

Thanks.
-- 
Narihiro Nakamura (nari)