Issue #13794 has been updated by gregoryp (Gregory Potamianos).

File sched_yield_loop.rb added

Hi,

We are also affected by this bug, running Debian's ruby 2.3.3p222. We are encountering it with Exec resources of puppet agents timing out and leaving stray processes on busyloop. We have tried to reproduce it but not with much success. The attached simple fork/exec script seems to reproduce it sporadically if run like `while true; do nice -n19 ruby sched_yield_loop.rb; [ $? -ne 0 ] && break; done`. It eventually raises a timeout error and leaves a child behind spinning on sched_yield.

catphish (Charlie Smurthwaite) wrote:
> Hi Eric,
> 
> I have been testing your original patch (just the PID check) for a couple of days and it appears to have resolved the problem. I will report on this again in 1 week as the issue occurs quite randomly but I am currently hopeful. Thank you very much again for looking into this.
> 
> Charlie

We have been running ruby with this patch for 2 weeks now and it seems to solve the problem for us also. Is there any chance of this being merged/backported to ruby 2.3?

Regards,
Gregory

----------------------------------------
Bug #13794: Infinite loop of sched_yield
https://bugs.ruby-lang.org/issues/13794#change-66992

* Author: catphish (Charlie Smurthwaite)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-linux]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
I have been encountering an issue with processes hanging in an infinite loop of calling sched_yield(). The looping code can be found at https://github.com/ruby/ruby/blob/v2_3_4/thread_pthread.c#L1663

while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
  native_thread_yield();
}

It is my belief that by some mechanism I have not been able to identify, timer_thread_pipe.writing is incremented but it never decremented, causing this loop to run infinitely.

I am not able to create a reproducible test case, however this issue occurs regularly in my production application. I have attached backtraces and thread lists from 2 processes exhibiting this behaviour. gdb confirms that timer_thread_pipe.writing = 1 in these processes.

I believe one possibility of the cause is that rb_thread_wakeup_timer_thread() or rb_thread_wakeup_timer_thread_low() is called, and before it returns, another thread calls fork(), leaving the value of timer_thread_pipe.writing incremented, but leaving behind the thread that would normally decrement it.

If this is correct, one solution would be to reset timer_thread_pipe.writing to 0 in native_reset_timer_thread() immediately after a fork.

Other examples of similar bugs being reported:
https://github.com/resque/resque/issues/578
https://github.com/zk-ruby/zk/issues/50

---Files--------------------------------
backtrace_1.txt (14 KB)
backtrace_2.txt (10.9 KB)
sched_yield_1.patch (738 Bytes)
sched_yield_loop.rb (212 Bytes)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>