charlie / atech.media wrote:
> Bug #13794: Infinite loop of sched_yield
> https://bugs.ruby-lang.org/issues/13794
> ----------------------------------------
> I have been encountering an issue with processes hanging in an infinite loop of calling sched_yield(). The looping code can be found at https://github.com/ruby/ruby/blob/v2_3_4/thread_pthread.c#L1663
> 
> while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
>   native_thread_yield();
> }
> 
> It is my belief that by some mechanism I have not been able to identify, timer_thread_pipe.writing is incremented but it never decremented, causing this loop to run infinitely.
> 
> I am not able to create a reproducible test case, however this issue occurs regularly in my production application. I have attached backtraces and thread lists from 2 processes exhibiting this behaviour. gdb confirms that timer_thread_pipe.writing = 1 in these processes.

Can you also check the value of timer_thread_pipe.owner_process?

> I believe one possibility of the cause is that rb_thread_wakeup_timer_thread() or rb_thread_wakeup_timer_thread_low() is called, and before it returns, another thread calls fork(), leaving the value of timer_thread_pipe.writing incremented, but leaving behind the thread that would normally decrement it.

That is a likely possibility.

> If this is correct, one solution would be to reset timer_thread_pipe.writing to 0 in native_reset_timer_thread() immediately after a fork.

How about checking owner_process before incrementing?
Can you try the following patch to check owner_process?

   https://80x24.org/spew/20170809232533.14932-1-e / 80x24.org/raw

timer_thread_pipe.writing was introduced in August 2015 with r51576,
so this bug would definitely be my fault.

> Other examples of similar bugs being reported:
> https://github.com/resque/resque/issues/578
> https://github.com/zk-ruby/zk/issues/50

That also means these bugs from 2012 are from other causes.


Thanks again for this report.

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>