Roger,

You ran this benchmark suite, correct?

http://github.com/acangiano/ruby-benchmark-suite/tree/master

I'd never heard of them before now.  Thanks!

I don't believe that these patches cause GC to run any less frequently by
default.
GC is still run (by default) after allocating 8MB of objects.  Nothing I'm
doing causes Ruby to allocate fewer or smaller objects.  I do believe we are
seeing that applications with large stack space(s) spend a lot of time
during GC scanning each and every word on those stacks.  These patches make
those stacks much smaller and zero out most ghost object pointers so they no
longer need to be marked.

see my comments below, marked Brent:


rogerdpack wrote:
> 
> Seems to overall be a tidge slower for "micro" stuff--5 or 10%.
> viz:
> lloyd gc bench:
> ...
> 
> But that's for micro-benchmarks.
> I think the reason we see people's performance increase is that since
> the GC is suddenly more effective, it doesn't get called as often.  A
> big win for larger apps.
> 
> Overall I'd call it a large win for Ruby in terms of being much more
> stable size-wise in a multi-threaded environment and suggest their
> incorporation verbatim.  All 6 :)
> 
> raw ruby-benchmark-suite comparison is in the footnote.
> Note a few things:
> one test erred with 187 normal but succeeded with MBARI patches
> (core-library/bm_so_concatenate.rb)
> the threaded tests do indeed run faster with MBARI.
> 
> normal:
> core-library/bm_vm3_thread_create_join.rb,0.20678186416626
> patched:
> core-library/bm_vm3_thread_create_join.rb,0.0140390396118164
> 
> Brent:  A >14x speed up.  Whoopie!  :-0
> 
> Some other thoughts I've had are that theoretically you only need to
> clear the stack once between GC's, so you may be able to just keep a
> "range already cleared" per thread or what not, and reset it after
> each GC.  This would especially work if rb_thread_alone is true.
> 
> You might be able to get away with only checking for stack depth once
> every CHECK_INT [instead of with xmalloc].
> 
> Maybe  even clear the stack only at ruby_stack_check [though this is
> probably too infrequent].
> 
> Brent:  I don't think that clearing the stack once would be sufficient. 
> And, "clearing the stack" is a bit misleading.  The unused memory area
> between the current stack pointer and its deepest recorded extent is being
> filled with zeros.  It's not really part of the stack when it is cleared. 
> This memory will become stack as new frames are pushed, thus approximating
> the effect of an imaginary compiler option to initialize all local
> variables and temporaries to zero.  So, I've got to clear the stack when
> it is shallow and record its depth when it is deep.  Recording the depth
> is very quick -- only about five machine instructions.  For single
> threaded apps, I could perhaps figure out, when the stack was shallow,
> that a GC was about to occur, and get away with zeroing the stack at that
> one point.  However, recall that  the collector scans each thread's stack
> in multithreaded apps (and those using Continuations).  So, I'd need to
> know when a GC or a context switch was going to occur while the stack was
> still shallow.  I haven't figured out how to implement that oracle
> function (and I doubt it is possible).
> 
> 
> I did a small experiment with memset versus tight loop and [somehow] a
> tight loop seems to win.
> 
> I think there is some potential for optimization if you were to use
> fixed 2K heap chunks and binary search for is_pointer_to_heap [with
> cacheing of the most recently found heap chunk to help save on speed].
>  Theoretically it might bring RAM usage down even further [1.9 does
> this].
> 
> Brent:  Nothing I'm doing precludes any of these GC optimizations.  
> If they really do help, we could simply backport them.
> 
> 
> I know that at least for me I will definitely use these for my own
> apps so that they have more control for memory.
> 
> Re: javaeye.com speed "almost the same" with railsbench GC patch +
> these versus just railsbench GC patch--I think that what is happening
> in this case is that GC is being called only when the freelist is used
> up, since the malloc_limit is so large.  Tough to know how to speed it
> up in that case [except for running GC in a different process and
> earlier].
> 
> Thanks for your hard work.  I think it was something a few of us had
> thought necessary but never got up the gumption to do :)
> 
> Brent:  Necessity is often the mother of "gumption".  It certainly was in
> this case.
> 
> -=r
> 
> 

-- 
View this message in context: http://www.nabble.com/-ruby-core%3A19846---Bug--744--memory-leak-in-callcc--tp20447794p21182175.html
Sent from the ruby-core mailing list archive at Nabble.com.