Much of this is excerpted from updates at:

http://sites.google.com/site/brentsrubypatches/

I believe I discovered two latent MRI bugs that had been unearthed by my
factoring of rb_eval() into separate eval functions for each node type (in
MBARI 4).  This was all done in the course of determining why the MBARI7
patch set, while very solid on x86-32, was segfaulting on x86-64 CPUs. 

First:
The function evaluating a string literal node would, in some cases, pass the
internal pointer of a newly created ruby string object into rb_reg_new(),
which would derive a new regular expression object from it.  Trouble was,
gcc, when optimizing on a machine like the x86-64, would determine that the
pointer to the newly created string object need not be stacked and in fact
could be entirely "forgotten" as soon its text buffer was passed into
rb_reg_new().  Nothing wrong with that... unless a GC pass happened to be
triggered while deriving the new regular expression from that string
object's internal text buffer.  In which case, that now unreferenced String
object would never be marked and, as a result, that String and its text
buffer would be prematurely freed, trashing the regular expression pattern,
resulting in very occasional regex match failures and (very rare) heap
corruption.

The best test case I came up with for this first bug was:

   $ ruby -e "GC.stress=true; load 'test/pathtest/test_pathtest.rb'"

Lots of test failures with MBARI 7 (only on x86-64),  all passed with MBARI
8

The RB_GC_GUARD() macro introduced in 1.8.7 works nicely to prevent this
sort of problem:
+       RB_GC_GUARD(str);  /* prevent tail call optimization here */
       return str2;
     case NODE_DREGX_ONCE:	/* regexp expand once */
       str2 = rb_reg_new(RSTRING(str)->ptr, RSTRING(str)->len,
 			  node->nd_cflag);
       nd_set_type(node, NODE_LIT);
+      RB_GC_GUARD(str);  /* ensure str is not GC'd in rb_reg_new */
       return node->nd_lit = str2;


Second:
eval.c is full of setjmp()s and longjmp() calls.  These are error prone
constructs for a number reasons.  The most insidious of which is the fact
that the 'C' spec does not require that non-volatile local variables in the
function containing a setjmp() be preserved when it returns via longjmp(). 
A few of the unpatched functions containing EXEC_TAG() in eval.c missed this
point, failing to declare as 'volatile' variables that might need to be
preserved on the stack in exceptional cases (redo clauses in some contexts,
for example).  And, many variables were declared volatile that did not need
to be, adding to the confusion.  This coding rule is difficult to maintain
during incremental development, even if one follows it properly in the first
place.  I added a few such errors of my own with MBARI4, because I did not
fully understand the volatile qualifier in this context and tried to follow
the "patterns" I saw in the existing code.    The volatile qualifier here is
to prevent variables from being cached in registers.  So, of course, CPUs
with large register files would be most susceptible to this class of bug.


Two independent testers had more elaborate test cases that failed very
occasionally on x86-64 with the MBARI 7 patches on x86-64.  They both report
that their tests now work with MBARI 8A.

Here is one of the simpler examples of volatile qualifiers being misplaced
in the unpatched eval.c:
(See if you don't agree)

/* function to call func under the specified class/module context */
static VALUE
exec_under(func, under, cbase, args)
    VALUE (*func)();
-    VALUE under, cbase;
+    VALUE under;
+    volatile VALUE cbase;
    void *args;
{
-    VALUE val = Qnil;		/* OK */
+    VALUE val;
    int state;
-    int mode;
+    volatile int mode;
    struct FRAME *f = ruby_frame;

    PUSH_CLASS(under);
    PUSH_FRAME();
    ruby_frame->self = f->self;
    ruby_frame->last_func = f->last_func;
    ruby_frame->orig_func = f->orig_func;
    ruby_frame->last_class = f->last_class;
    ruby_frame->argc = f->argc;
    if (cbase) {
	PUSH_CREF(cbase);
    }

    mode = scope_vmode;
    SCOPE_SET(SCOPE_PUBLIC);
    PUSH_TAG(PROT_NONE);
    if ((state = EXEC_TAG()) == 0) {
	val = (*func)(args);
    }
    POP_TAG();
    if (cbase) POP_CREF();
    SCOPE_SET(mode);
    POP_FRAME();
    POP_CLASS();
    if (state) JUMP_TAG(state);

    return val;
}


Unfortunately, there is a lot of this sort of thing in eval.c.
MBARI8 is a big patch as a result.

I'm guessing that we are not seeing failures in the field because the x86-32
has very few registers, so, in practice, variables are usually on the stack
and thus preserved through longjmps even when they are not explicitly
declared volatile.

I'd actually be quite relieved if I were proven wrong about this :-)
Otherwise, these fixes will be in the patches I submit upstream.

The only bright side is that, by removing the volatile qualifier from
variables where
it was unnecessary, MBARI8 on x86-64 seems to have realized a few percent
speed improvement running
the ruby test suite over both unpatched Ruby and the Ruby with the MBARI7
patch.


- brent


-- 
View this message in context: http://www.nabble.com/MBARI8-patch-fixes-bugs-caused-by-incorrect-volatile-variable-declarations-tp22259357p22259357.html
Sent from the ruby-core mailing list archive at Nabble.com.