Stephen,

I updated the MBARI7 patch at http://sites.google.com/site/brentsrubypatches
again last night (on 1/11/09) before I'd read your post.  (Sorry)

I had already concluded that -mpreferred-stack-boundary=2 is generally a
"bad idea" and have removed it from the recommended options.  It has
portability problems and, even where it works, the net loss in  speed is not
worth the small reduction stack usage for most Ruby scripts.  One option
that does increase  speed about 7% across the board is -fomit-frame-pointer. 
It seems to work well with most recent gcc compilers, but segaults on older
ones, so I'm not recommending it by default.  I believe that microsoft 'C'
has an analogous option.

This latest update to MBARI7 adds a configuration option to select the
method used to clear the stack among four alternatives.  The default is to
use a (new) portable method that allocates the "dirty" stack briefly with
alloca() before clearing it.  This portable method costs time (~1.5%), but
it is safer.

In practice,
The 32-bit x86 is so starved for registers that I'd seen cases where gcc
would emit a PUSH %ESP between the point in the (old, fast) stack clearing
routine that read the stack pointer and the loop that was to zero
unallocated stack above the top.  This would cause the stacked base pointer
to be cleared as well and yield segfault when it was later POP'ed from the
stack.  Fortunately, if this happens, the resulting Ruby binary fails
immediately on the (bogus1.rb and bogus1.rb) test scripts included with the
patches. 

Ironically, -mpreferred-stack-boundary=2 will make the new, portable stack
clearing method ineffective due to gcc's insistence that alloca(x>0) always
return a 16-byte aligned pointer regardless of the configured
preferred-stack-boundary.  This might be considered a bug, but I'm honestly
not sure.

I cannot seem to find a stack clearing method that is both safe and
portable.  Maybe others will succeed where I have punted.  For now, my tests
indicate that, on 32-bit x86 with gcc 4.3, the combination of

CFLAGS="-O2 -fomit-frame-pointer -fno-stack-protector"
and
#define STACK_WIPE_SITES 0x4370  /* in rubysig.h */

works best.  It protects against ghost references well and runs even
micro-benchmarks slightly faster than unpatched 1.8.7-p72.


- brent


Stephen Sykes-3 wrote:
> 
> Brent,
> 
> A report from the field...
> 
> We have been using your patches in a production Rails environment
> since you released them, and this is on x86_64-linux.
> 
> We notice no problems, ruby works well and is significantly faster.
> 
> And to keep up to date, we just applied patch MBARI7 (from
> http://sites.google.com/site/brentsrubypatches/ ) with the default
> configuration.  FWIW we see a further small performance improvement,
> something like 5% on a rough measurement.
> 
> Just a note on your build instructions: the
> -mpreferred-stack-boundary=2 flag causes configure to fail on OSX,
> complaining that it can't find the size of int (the program to do so
> segfaults).  And that setting is not accepted by gcc on x86_64 because
> it needs the boundary to be 4 or more.  In both cases I removed the
> option and all works fine.
> 
> Regards,
> Stephen
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/-ruby-core%3A19846---Bug--744--memory-leak-in-callcc--tp20447794p21422034.html
Sent from the ruby-core mailing list archive at Nabble.com.