After a couple weeks of long nights and false starts, I feel I may have come
up with
a fix for a large class of Ruby memory leak.  The basic technique is a
refinement of the
one Kurt Stephens suggested.  It not only eliminates the leaks in this one
liner:

  loop {@x=callcc{|c|c}}

but also in our multi-threaded robotics application.  Our Ruby process used
to grow
to 20+ MB during a day long run.  The same run now stays smaller than 10MB.
On an embedded ARM Linux machine with only 32MB of DRAM, this is a great
result!

The central problem is that gcc (and other compilers) tend to create
sparse stack frames such that, when a new frame is pushed onto the stack, it
does not 
completely overwrite the one that had been previously stored there.  The new
frame gets
activated with old VALUE pointers preserved inside its holes.  These become
"live" again
as far as any conservative garbage collector is concerned.  And, viola, a
leak is born!

I implemented a scheme for recording the maximum depth of the C stack in
xmalloc and during garbage collection itself.  However, I realized that 
there was no point in clearing the stack when it is near its maximum depth.
Instead, stack clearing is deferred until CHECK_INTS, as this tends to
happen
between evaluation of nodes, when the stack is likely to be shallower. 

At this point
a tight loop quickly zeros the region between the current top of stack, as 
returned by alloca(0), and the maximum recorded stack extent.  It also
updates
the stack extent so no memory is cleared repeatedly if the stack contracts
further.

This paper discusses this and similar techniques:
http://www.hpl.hp.com/personal/Hans_Boehm/gc/papers/pldi93.ps.Z

Another related issue is that the style of rb_eval() in eval.c in the
1.8 and 1.6 series causes gcc to emit a especially large and sparse stack
frames.
Consider that gcc allocates two pair of stack slots for r and l in
constructs like this:

    switch (nd_type(node)) {
 	/* nodes for speed-up(literal match) */
      case NODE_MATCH2:
	{
	    VALUE l = rb_eval(self,node->nd_recv);
	    VALUE r = rb_eval(self,node->nd_value);
	    result = rb_reg_match(l, r);
	}
	break;

	/* nodes for speed-up(literal match) */
      case NODE_MATCH3:
	{
	    VALUE r = rb_eval(self,node->nd_recv);
	    VALUE l = rb_eval(self,node->nd_value);
....

By the time the compiler's optimizer is allocating stack frame slots, all
the block structure
of the original code has been lost in various transformations.
As a result, each rb_eval() call ends up pushing about 4k bytes onto the C
stack, 
of which less than 20% is even initialized.    This means that:

1)  There is a high probability that old VALUEs from previous frames 
     will be resurrected as the stack grows,

2)  The GC must scan a sparse, large stack and mark the many dead object
pointers it contains.

3)  callcc and thread context switches must copy needlessly large stacks

4)  recursive Ruby programs run out of stack space much earlier than than
they might otherwise.

When I simply re-factored rb_eval() such that it calls a (non-inline)
function
for each node type it encounters, the total observed C stack size for my
application
was reduced by more than two thirds.  Not surprisingly, threading and
continuation
micro benchmarks and run about 3 - 4 times faster.  However, I expect that
benchmarks that
operate repeatedly on a few large, long lived objects will run slower.

Keep in mind that these techniques should improve the performance of *any*
garbage
collector that scans the unstructured C stack for valid object pointers.  It
may
even be relevant for the 1.9 series Ruby, but I'll leave that for those more
qualified to determine.

Today, this is implemented only in my heavily patched version of Ruby 1.6.8.
In the short term, if there's interest,
I can quickly post my hacked 1.6.8 Ruby to an FTP site for others to test.

Longer term,
The stack clearing could be supplied as a small patch to the 1.8 series,
however the
refactoring of rb_eval() is probably too large to be attached to an email
message
on this list.  I will take the time to produce these patches only if at
least a few people
commit to testing them,  reporting detailed results and suggestions for
improvement here.

- brent


-- 
View this message in context: http://www.nabble.com/-ruby-core%3A19846---Bug--744--memory-leak-in-callcc--tp20447794p20731668.html
Sent from the ruby-core mailing list archive at Nabble.com.