Issue #16288 has been updated by davidw (David Welton).


mame (Yusuke Endoh) wrote:
> Thank you for the report and the great investigation!  I could reproduce the issue by using your example: https://github.com/mainameiz/segfault_app

Thanks for checking in to it.

> Currently, starting a thread in a finalizer is dangerous.  The termination of the interpreter is: (1) kill all threads except the main thread, (2) run all finalizers, and (3) destruct all (including all mutexes, the main thread, timer thread, VM itself, etc.).  If a finalizer creates a thread, it starts running during or after (3), which leads to a catastrophic situation.

Yes, I think there are multiple parts to this bug. It makes sense to me that starting threads in a finalizer like the concurrenty-ruby gem is doing is not correct.

> An easy solution is to prohibit thread creation after the process (3) is started.  The following patch fixes the segfault of your example.

Yes, that prevents the segfault in my test application as well.

> However, as the last two lines show, Timeout cannot be used safely in a finalizer.  I'm unsure if it is acceptable, but to support thread creation in a finalizer, we need to revamp the termination process.  @ko1 and @nobu, what do you think?
> 
> @davidw I could be wrong as I don't understand your statement about thread_join, but I couldn't see the behavior by running your example under gdb.  Anyways, thanks for the great investigation.  It is really helpful.

Apologies, I was just kind of guessing at what was going on - I am not at all familiar with Ruby's internals! I think the interaction is kind of complex, as you can see from the simple code I posted in the first bug report, which only tries to launch a thread in a finalizer - and fails. It's strange that the timeout.rb code is able to successfully create a thread.

Thank you for your prompt response and looking into the problem.



----------------------------------------
Bug #16288: Segmentation fault with finalizers, threads
https://bugs.ruby-lang.org/issues/16288#change-82460

* Author: davidw (David Welton)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.6p116 (2019-10-02 revision 67825) [x86_64-linux]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi,

This is a tricky one and I am still working on narrowing it down, but I will report what I have so far.

I compiled a version of 2_6_6 from github: ruby 2.6.6p116 (2019-10-02 revision 67825) [x86_64-linux]

I have a minimal Rails project that uses Mongoid. It crashes with a segmentation fault when rspec runs. The concurrent ruby gem is in some way involved, and I have been posting there: https://github.com/ruby-concurrency/concurrent-ruby/issues/808

However, I think there is a deeper problem - I would not expect a user level script to cause a segmentation fault.

I have been putting a lot of debugging statements in, and turned on Thread.DEBUG, and have noticed some things. I am not experienced with Ruby's internals, so some of these bits of data might be normal or irrelevant:

* The concurrent-ruby gem uses ObjectSpace.define_finalizer to set a finalizer
* That finalizer creates a new Thread
* However, it appears as if that thread is running after the main thread is already dead, so code that expects to reference the main thread crashes, because it's a NULL reference.

I tried the following test code:

```
class Foo
  def initialize
    ObjectSpace.define_finalizer(self, proc do
                                   Foo.foo_finalizer
                                 end)
  end

  def bar
    puts 'bar'
  end

  def Foo.foo_finalizer
    puts "foo_finalizer"
    t = Thread.new do
      puts "Thread reporting for duty"
    end
    puts "foo_finalizer thread launched"
    sleep 5
  end
end

f = Foo.new
f.bar
f = nil
```

While trying to develop a simple test case to demonstrate the problem. It triggers rb_raise(rb_eThreadError, "can't alloc thread"); in thread_s_new, because it looks like the main thread has already been marked as 'killed' in this case. When I check the main thread status in thread_s_new with the above code, it reports 'dead'.

When I run my rspec code in the sample Rails project, thread_s_new shows the main thread's status as 'run' even if it should be dead?

I have seen some debugging things that shows some exceptions and thread_join interrupts and so on.

Is it possible that something like this is happening?

Main thread starts doing a cleanup, and gets an exception or something that generates an interrupt, and its KILLED status gets reset to RUNNABLE

Then, in the finalizer, it starts creating a Thread, but at this point the main thread actually does get killed, and when that finalizer thread tries to run it runs into a null reference?

I can provide the Rails sample project if needs be.

Sorry if any of the above isn't clear; I've been staring at the C code for several hours and am a bit cross-eyed!

Thank you for any insights.





-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>