Mike,

Certainly, if one copies byte-at-a-time, performance will be awful.
I'm copying aligned words one ruby VALUE sized word at a time.

As an experiment, I tried substituting memset for my tight stack clearing
loop...

and discovered that memset() is actually quite a large function, 
and gcc does not inline it.  It is large because,  in this context, the
compiler
cannot tell that the pointers are already long-word aligned and that we
are copying an integer number of long words.  So it emits code to copy
bytes on either end.  And, since we're trying to clear memory from
the current stack pointer down, we must also add a kludgey offset to avoid 
wiping memset()'s own stack frame.

If anyone else wants to try this on an x86, in rubysig.h, change:

#define __stack_zero_down(end,sp)  while (end <= --sp) *sp=0
to:
#define __stack_zero_down(end,sp) \
  if (sp-6 > end) memset(end, 0, (void *)(sp-6)-(void*)end)

My tiny "bogus1" and "bogus2" show no measurable improvement, but perhaps it
might 
help for a larger application.

On the other hand...
Very recently, folks who've looked into this far more intensively than
I concluded that an unrolled 'C' loop was better than the venerable

  rep stols

assembly instructions used by x86 gcc's __built_in_memset().  See:

http://sourceware.org/ml/newlib/2008/msg00286.html

They note that microcoded instructions are slower than simple ones for
the modern x86 (RISC-ish) execution cores.  The fastest way to clear
memory these days is supposedly to use MMX instructions.
(I'm not going there, but I welcome others to explore where that might lead
:-)

- brent


Michael Selig wrote:
> 
> On Mon, 22 Dec 2008 20:59:05 +1100, Brent Roman <brent / mbari.org> wrote:
> 
>> I suspect memzero would be slower than the tight loop I have zeroing the
>> stack now.
> 
> In my experience on x86 architecture using GCC, "memset(p, 0, len)" is  
> substantially faster than a tight loop (between 2 & 10 times faster  
> depending whether the loop is byte-by-byte or word-by-word). This is  
> because GCC knows to optimize "memset" inline to a single instruction (or  
> close to it).
> 
> Mike
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/-ruby-core%3A19846---Bug--744--memory-leak-in-callcc--tp20447794p21156358.html
Sent from the ruby-core mailing list archive at Nabble.com.