Issue #4612 has been updated by Tomoyuki Chikanaga.


Hi,

Sorry for late reply, and thank you for your reporting bug :)

I've checked I can reproduce segv and your patch fix the problem.
And make test-all reports no extra error.
I think your patch is reasonable. I'll check in it to trunk.

 >BTW, while following the code around fibers and continuations, 
 >I've found another curious thing: fiber_free() calls cont_free(&fib->cont)
 >on its cont member, and cont_free() calls ruby_xfree() on it. Is that ok,
 >given cont was allocated as part of fiber, and don't we need to ruby_xfree the fib itself?
'cont' is a first member of structure 'rb_fiber_t', and
(void *)(&fib->cont) == (void *)&fib
That's why we don't need (in fact, must not) to call ruby_xfree() to fib itself.
----------------------------------------
Bug #4612: Segmentation fault in fiber GC mark cycle
http://redmine.ruby-lang.org/issues/4612

Author: Serge Balyuk
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: 
ruby -v: ruby 1.9.2p188 (2011-03-28 revision 31204) [x86_64-darwin10.7.0]


=begin
((|Fiber.current|)) can cause segfault on GC cycle when used in threads. Please find attached ruby sample which should help to pinpoint the problem.

The coredump shows the following backtrace:

 ....
 #18 <signal handler called>
 #19 0x000000010004cce5 in mark_locations_array (objspace=0x100838000, x=0x101dc8ab0, n=649) at gc.c:1315
 #20 0x000000010004cea6 in gc_mark_locations (objspace=0x100838000, start=0x101dc8ab0, end=0x101dc9f00) at gc.c:1331
 #21 0x000000010004f3dc in rb_gc_mark_machine_stack (th=0x1008a1048) at gc.c:2235
 #22 0x0000000100177a4f in rb_thread_mark (ptr=0x1008a1048) at vm.c:1683
 #23 0x000000010018179a in cont_mark (ptr=0x1008a1000) at cont.c:88
 #24 0x0000000100181947 in fiber_mark (ptr=0x1008a1000) at cont.c:168
 #25 0x000000010004dbb9 in gc_mark_children (objspace=0x100838000, ptr=4303907200, lev=1) at gc.c:1719
 #26 0x000000010004d3fb in gc_mark (objspace=0x100838000, ptr=4303907200, lev=0) at gc.c:1514
 #27 0x000000010004d428 in rb_gc_mark (ptr=4303907200) at gc.c:1520
 #28 0x000000010017797f in rb_thread_mark (ptr=0x10049fa00) at vm.c:1673
 ....

It looks like in frame 28 ((|rb_thread_mark|)) is called on terminated thread, which is ok, but then it goes down to fiber's continuation member ((|saved_thread|)) in frame 22. I think saved_thread holds an earlier snapshot (still running) of the same thread that we see in 28 (thread_id are equal), and because its machine stack pointers are stale, ((|rb_gc_mark_machine_stack|)) starts marking inaccessible memory.

One possible quick-fix is to set machine stack pointers to 0 in ((|cont_init()|)), given the original thread will take care of that stuff and free as needed. It cures segfaults, but I wonder if that doesn't break some other code.

I was also wondering why continuation holds a copy of thread struct instead of a pointer to it. It's hard to correctly follow real thread life cycle with ((|saved_thread|)). So can it harm in other cases like the above?

BTW, while following the code around fibers and continuations, I've found another curious thing: ((|fiber_free()|)) calls ((|cont_free(&fib->cont)|)) on its cont member, and ((|cont_free()|)) calls ((|ruby_xfree()|)) on it. Is that ok, given cont was allocated as part of fiber, and don't we need to ((|ruby_xfree|)) the ((|fib|)) itself?

The attached patch is made against ruby_1_9_2 branch; trunk seems to have the segfault behavior too.

=end



-- 
http://redmine.ruby-lang.org