Much of this is excerpted from updates at: http://sites.google.com/site/brentsrubypatches/ I believe I discovered two latent MRI bugs that had been unearthed by my factoring of rb_eval() into separate eval functions for each node type (in MBARI 4). This was all done in the course of determining why the MBARI7 patch set, while very solid on x86-32, was segfaulting on x86-64 CPUs. First: The function evaluating a string literal node would, in some cases, pass the internal pointer of a newly created ruby string object into rb_reg_new(), which would derive a new regular expression object from it. Trouble was, gcc, when optimizing on a machine like the x86-64, would determine that the pointer to the newly created string object need not be stacked and in fact could be entirely "forgotten" as soon its text buffer was passed into rb_reg_new(). Nothing wrong with that... unless a GC pass happened to be triggered while deriving the new regular expression from that string object's internal text buffer. In which case, that now unreferenced String object would never be marked and, as a result, that String and its text buffer would be prematurely freed, trashing the regular expression pattern, resulting in very occasional regex match failures and (very rare) heap corruption. The best test case I came up with for this first bug was: $ ruby -e "GC.stress=true; load 'test/pathtest/test_pathtest.rb'" Lots of test failures with MBARI 7 (only on x86-64), all passed with MBARI 8 The RB_GC_GUARD() macro introduced in 1.8.7 works nicely to prevent this sort of problem: + RB_GC_GUARD(str); /* prevent tail call optimization here */ return str2; case NODE_DREGX_ONCE: /* regexp expand once */ str2 = rb_reg_new(RSTRING(str)->ptr, RSTRING(str)->len, node->nd_cflag); nd_set_type(node, NODE_LIT); + RB_GC_GUARD(str); /* ensure str is not GC'd in rb_reg_new */ return node->nd_lit = str2; Second: eval.c is full of setjmp()s and longjmp() calls. These are error prone constructs for a number reasons. The most insidious of which is the fact that the 'C' spec does not require that non-volatile local variables in the function containing a setjmp() be preserved when it returns via longjmp(). A few of the unpatched functions containing EXEC_TAG() in eval.c missed this point, failing to declare as 'volatile' variables that might need to be preserved on the stack in exceptional cases (redo clauses in some contexts, for example). And, many variables were declared volatile that did not need to be, adding to the confusion. This coding rule is difficult to maintain during incremental development, even if one follows it properly in the first place. I added a few such errors of my own with MBARI4, because I did not fully understand the volatile qualifier in this context and tried to follow the "patterns" I saw in the existing code. The volatile qualifier here is to prevent variables from being cached in registers. So, of course, CPUs with large register files would be most susceptible to this class of bug. Two independent testers had more elaborate test cases that failed very occasionally on x86-64 with the MBARI 7 patches on x86-64. They both report that their tests now work with MBARI 8A. Here is one of the simpler examples of volatile qualifiers being misplaced in the unpatched eval.c: (See if you don't agree) /* function to call func under the specified class/module context */ static VALUE exec_under(func, under, cbase, args) VALUE (*func)(); - VALUE under, cbase; + VALUE under; + volatile VALUE cbase; void *args; { - VALUE val = Qnil; /* OK */ + VALUE val; int state; - int mode; + volatile int mode; struct FRAME *f = ruby_frame; PUSH_CLASS(under); PUSH_FRAME(); ruby_frame->self = f->self; ruby_frame->last_func = f->last_func; ruby_frame->orig_func = f->orig_func; ruby_frame->last_class = f->last_class; ruby_frame->argc = f->argc; if (cbase) { PUSH_CREF(cbase); } mode = scope_vmode; SCOPE_SET(SCOPE_PUBLIC); PUSH_TAG(PROT_NONE); if ((state = EXEC_TAG()) == 0) { val = (*func)(args); } POP_TAG(); if (cbase) POP_CREF(); SCOPE_SET(mode); POP_FRAME(); POP_CLASS(); if (state) JUMP_TAG(state); return val; } Unfortunately, there is a lot of this sort of thing in eval.c. MBARI8 is a big patch as a result. I'm guessing that we are not seeing failures in the field because the x86-32 has very few registers, so, in practice, variables are usually on the stack and thus preserved through longjmps even when they are not explicitly declared volatile. I'd actually be quite relieved if I were proven wrong about this :-) Otherwise, these fixes will be in the patches I submit upstream. The only bright side is that, by removing the volatile qualifier from variables where it was unnecessary, MBARI8 on x86-64 seems to have realized a few percent speed improvement running the ruby test suite over both unpatched Ruby and the Ruby with the MBARI7 patch. - brent -- View this message in context: http://www.nabble.com/MBARI8-patch-fixes-bugs-caused-by-incorrect-volatile-variable-declarations-tp22259357p22259357.html Sent from the ruby-core mailing list archive at Nabble.com.