> You ran this benchmark suite, correct?
>
> http://github.com/acangiano/ruby-benchmark-suite/tree/master

Yeah, that and http://lloydforge.org/projects/misc/
the latter taking considerably less time to run :)


> I don't believe that these patches cause GC to run any less frequently by
> default.
> GC is still run (by default) after allocating 8MB of objects.  Nothing I'm
> doing causes Ruby to allocate fewer or smaller objects.  I do believe we are
> seeing that applications with large stack space(s) spend a lot of time
> during GC scanning each and every word on those stacks.  These patches make
> those stacks much smaller and zero out most ghost object pointers so they no
> longer need to be marked.

It would be interesting to see if the GC is being caused by malloc
versus running out of free list.  If it's the latter then the patches
could indeed cause GC to be less frequent.  If not then maybe it's as
you said--GC just takes less time as there's less to traversal during
the mark phase.

>> Brent:  A >14x speed up.  Whoopie!  :-0
Yeah I think multi-thread apps will definitely like this.
Unfortunately most benchmarks are single threaded and micro-y so won't
show the "real" speedup [Antonio's included].


>> Brent:  I don't think that clearing the stack once would be sufficient.
>> And, "clearing the stack" is a bit misleading.  The unused memory area
>> between the current stack pointer and its deepest recorded extent is being
>> filled with zeros.  It's not really part of the stack when it is cleared.
>> This memory will become stack as new frames are pushed, thus approximating
>> the effect of an imaginary compiler option to initialize all local
>> variables and temporaries to zero.  So, I've got to clear the stack when
>> it is shallow and record its depth when it is deep.  Recording the depth
>> is very quick -- only about five machine instructions.  For single
>> threaded apps, I could perhaps figure out, when the stack was shallow,
>> that a GC was about to occur, and get away with zeroing the stack at that
>> one point.  However, recall that  the collector scans each thread's stack
>> in multithreaded apps (and those using Continuations).  So, I'd need to
>> know when a GC or a context switch was going to occur while the stack was
>> still shallow.  I haven't figured out how to implement that oracle
>> function (and I doubt it is possible).

Hmm so the biggest speed hit is probably in the clearing of the stack
[over and over] right? [judging from your comment that measurement is
cheap].

I was just suggesting that once a thread has [reached a very shallow
spot and cleaned the stack in its entirety] it only needs to repeat
that after the next GC--left over references from this round will be
cleared [once] after the subsequent GC (when the thread reaches a
shallow point again).  So if you're willing to wait a couple of GC's,
you only have to clear once per GC, per thread.  So the oracle is "do
it once after each GC."
Sorry it's hard to explain.

Anyway imagine a single threaded app.  As long as that app clears the
stack "once and well" [say the first time it gets very high it cleans
off the whole thing--or accomplish this piece-wise as it grows high
the first time] then in a staggered way, every reference to garbage
will eventually be zeroed out and the item collected.
Not that it really matters I'm just trying to make sure that my
thought has been explained well.
thoughts?
-=r