Issue #13794 has been updated by catphish (Charlie Smurthwaite).


> Can you also check the value of timer_thread_pipe.owner_process?


(gdb) print timer_thread_pipe.writing
$1 = 1
(gdb) print timer_thread_pipe.owner_process
$2 = 0

(gdb) info threads
  Id   Target Id         Frame 
  2    Thread 0x7f1f98a2f700 (LWP 19597) "ruby-timer-thr" 0x00007f1f976e9c5d in poll ()
    at ../sysdeps/unix/syscall-template.S:81
* 1    Thread 0x7f1f98a24740 (LWP 19595) "ruby" 0x00007f1f976c81d7 in sched_yield ()
    at ../sysdeps/unix/syscall-template.S:81


----------------------------------------
Bug #13794: Infinite loop of sched_yield
https://bugs.ruby-lang.org/issues/13794#change-66184

* Author: catphish (Charlie Smurthwaite)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.4p301 (2017-03-30 revision 58214) [x86_64-linux]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
I have been encountering an issue with processes hanging in an infinite loop of calling sched_yield(). The looping code can be found at https://github.com/ruby/ruby/blob/v2_3_4/thread_pthread.c#L1663

while (ATOMIC_CAS(timer_thread_pipe.writing, (rb_atomic_t)0, 0)) {
  native_thread_yield();
}

It is my belief that by some mechanism I have not been able to identify, timer_thread_pipe.writing is incremented but it never decremented, causing this loop to run infinitely.

I am not able to create a reproducible test case, however this issue occurs regularly in my production application. I have attached backtraces and thread lists from 2 processes exhibiting this behaviour. gdb confirms that timer_thread_pipe.writing = 1 in these processes.

I believe one possibility of the cause is that rb_thread_wakeup_timer_thread() or rb_thread_wakeup_timer_thread_low() is called, and before it returns, another thread calls fork(), leaving the value of timer_thread_pipe.writing incremented, but leaving behind the thread that would normally decrement it.

If this is correct, one solution would be to reset timer_thread_pipe.writing to 0 in native_reset_timer_thread() immediately after a fork.

Other examples of similar bugs being reported:
https://github.com/resque/resque/issues/578
https://github.com/zk-ruby/zk/issues/50

---Files--------------------------------
backtrace_1.txt (14 KB)
backtrace_2.txt (10.9 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>