Issue #14659 has been updated by nbeyer / gmail.com (Nathan Beyer).


I've been unable to create a reduced example. I have not tried the suggested backports mentioned above, but if I get a chance, I will.

I did want to post that I worked around the issue by using Phusion Passengers smart spawning hooks: https://www.phusionpassenger.com/library/indepth/ruby/spawn_methods/#smart-spawning-hooks. In the bit of code where I start the sub-system, which in turn creates the threads that were at issue, the code now registers with the smart spawning hook and when the hook is invoked, it restarts the sub-system, which in turn restarts the threads. I'm not sure why the hooks are required to restart the threads, rather than just dynamically doing so when detecting the thread is dead, but it resolves the issue for now.

----------------------------------------
Bug #14659: segfault in ConditionVariable#broadcast and ConditionVariable#signal
https://bugs.ruby-lang.org/issues/14659#change-72385

* Author: nbeyer / gmail.com (Nathan Beyer)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
I'm encountering a consistent segfault within a Rails application running Phusion Passenger on Ruby 2.5.0 and Ruby 2.5.1 when invoking either the #broadcast or #signal method on a ConditionVariable.

Here's what the code that interacts with the ConditionVariable looks like:

~~~ ruby
    def add(event)
      unless @job.alive?
        @lock.synchronize do
          unless @job.alive?
            @job = Thread.new { process }
          end
        end
      end

      @lock.synchronize do
        @events << event

        # this invocation causes the segfault
        @ready.broadcast
      end
    end
~~~

I can swap out the call to #broadcast with #signal and it causes the same segfault. What I've narrowed it down to based on the C level backtrace is that when using #broadcast, it segfaults on this line of code in thread_sync.c inside the wakeup_all(struct list_head *head) function:

~~~ c
	if (cur->th->status != THREAD_KILLED) {
~~~

When using #signal, it segfaults on similar line of code, but from the wakeup_one(struct list_head *head) function:

~~~ c
	if (cur->th->status != THREAD_KILLED) {
~~~

Here are the links to those lines of code in GitHub (for reference) https://github.com/ruby/ruby/blob/trunk/thread_sync.c#L46, https://github.com/ruby/ruby/blob/trunk/thread_sync.c#L30.

In looking at the changes, it looks like the condition variable code was notably rewritten with this commit (https://github.com/ruby/ruby/commit/ea1ce47fd7f2bc9023e9a1391dbadcfaf9e892ce) and that only made it into the 2.5.x line of code. This segfault doesn't happen with anything prior to 2.5.0.

I'm wondering if there is some relation to how Phusion Passenger forks processes for the purposes of Smart Spawning (https://www.phusionpassenger.com/library/indepth/ruby/spawn_methods/). My code is restarting any dead threads, but I don't think there's anything else it can do.

I'm still working on trimming the code down to get a very small, reproducible example, but wanted to post all of this information, as I was hoping someone that's familiar with the internals of thread_sync.c might be able to point out some additional evidence.





-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>