Issue #16288 has been updated by mame (Yusuke Endoh).


Thank you for the report and the great investigation!  I could reproduce the issue by using your example: https://github.com/mainameiz/segfault_app

Currently, starting a thread in a finalizer is dangerous.  The termination of the interpreter is: (1) kill all threads except the main thread, (2) run all finalizers, and (3) destruct all (including all mutexes, the main thread, timer thread, VM itself, etc.).  If a finalizer creates a thread, it starts running during or after (3), which leads to a catastrophic situation.

An easy solution is to prohibit thread creation after the process (3) is started.  The following patch fixes the segfault of your example.

```diff
diff --git a/thread.c b/thread.c
index eff5d39b51..fc609907ef 100644
--- a/thread.c
+++ b/thread.c
@@ -833,6 +833,11 @@ thread_create_core(VALUE thval, VALUE args, VALUE (*fn)(void *))
 	rb_raise(rb_eThreadError,
 		 "can't start a new thread (frozen ThreadGroup)");
     }
+    if (current_th->vm->main_thread->status == THREAD_KILLED) {
+        rb_warn("can't start a new thread after the main thread has stopped");
+        rb_raise(rb_eThreadError,
+                 "can't start a new thread (the main thread has already terminated)");
+    }
 
     if (fn) {
         th->invoke_type = thread_invoke_type_func;
```

By this patch, your example ends gracefully.

```
$ bundle exec rspec spec/models/user_spec.rb
/home/mame/work/ruby/local/lib/ruby/gems/2.7.0/gems/mongoid-7.0.5/lib/mongoid.rb:104: warning: The last argument is used as the keyword parameter
/home/mame/work/ruby/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.0/lib/active_support/core_ext/module/delegation.rb:171: warning: for `delegate' defined here
config.eager_load is set to nil. Please update your config/environments/*.rb files accordingly:

  * development - set it to false
  * test - set it to false (unless you use a tool that preloads your test environment)
  * production - set it to true

/home/mame/work/ruby/local/lib/ruby/gems/2.7.0/gems/tzinfo-1.2.5/lib/tzinfo/ruby_core_support.rb:142: warning: The last argument is used as the keyword parameter
/home/mame/work/ruby/local/lib/ruby/gems/2.7.0/gems/tzinfo-1.2.5/lib/tzinfo/ruby_core_support.rb:142: warning: The last argument is used as the keyword parameter
/home/mame/work/ruby/local/lib/ruby/gems/2.7.0/gems/actionpack-6.0.0/lib/action_dispatch/middleware/stack.rb:37: warning: The last argument is used as the keyword parameter
/home/mame/work/ruby/local/lib/ruby/gems/2.7.0/gems/actionpack-6.0.0/lib/action_dispatch/middleware/static.rb:110: warning: for `initialize' defined here
..

Finished in 0.00977 seconds (files took 1.22 seconds to load)
1 example, 0 failures

/home/mame/work/ruby/local/lib/ruby/2.7.0/timeout.rb:85: warning: can't start a new thread on a finalizer
/home/mame/work/ruby/local/lib/ruby/2.7.0/timeout.rb:85: warning: can't start a new thread on a finalizer
```

However, as the last two lines show, Timeout cannot be used safely in a finalizer.  I'm unsure if it is acceptable, but to support thread creation in a finalizer, we need to revamp the termination process.  @ko1 and @nobu, what do you think?

@davidw I could be wrong as I don't understand your statement about thread_join, but I couldn't see the behavior by running your example under gdb.  Anyways, thanks for the great investigation.  It is really helpful.

----------------------------------------
Bug #16288: Segmentation fault with finalizers, threads
https://bugs.ruby-lang.org/issues/16288#change-82438

* Author: davidw (David Welton)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.6p116 (2019-10-02 revision 67825) [x86_64-linux]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi,

This is a tricky one and I am still working on narrowing it down, but I will report what I have so far.

I compiled a version of 2_6_6 from github: ruby 2.6.6p116 (2019-10-02 revision 67825) [x86_64-linux]

I have a minimal Rails project that uses Mongoid. It crashes with a segmentation fault when rspec runs. The concurrent ruby gem is in some way involved, and I have been posting there: https://github.com/ruby-concurrency/concurrent-ruby/issues/808

However, I think there is a deeper problem - I would not expect a user level script to cause a segmentation fault.

I have been putting a lot of debugging statements in, and turned on Thread.DEBUG, and have noticed some things. I am not experienced with Ruby's internals, so some of these bits of data might be normal or irrelevant:

* The concurrent-ruby gem uses ObjectSpace.define_finalizer to set a finalizer
* That finalizer creates a new Thread
* However, it appears as if that thread is running after the main thread is already dead, so code that expects to reference the main thread crashes, because it's a NULL reference.

I tried the following test code:

```
class Foo
  def initialize
    ObjectSpace.define_finalizer(self, proc do
                                   Foo.foo_finalizer
                                 end)
  end

  def bar
    puts 'bar'
  end

  def Foo.foo_finalizer
    puts "foo_finalizer"
    t = Thread.new do
      puts "Thread reporting for duty"
    end
    puts "foo_finalizer thread launched"
    sleep 5
  end
end

f = Foo.new
f.bar
f = nil
```

While trying to develop a simple test case to demonstrate the problem. It triggers rb_raise(rb_eThreadError, "can't alloc thread"); in thread_s_new, because it looks like the main thread has already been marked as 'killed' in this case. When I check the main thread status in thread_s_new with the above code, it reports 'dead'.

When I run my rspec code in the sample Rails project, thread_s_new shows the main thread's status as 'run' even if it should be dead?

I have seen some debugging things that shows some exceptions and thread_join interrupts and so on.

Is it possible that something like this is happening?

Main thread starts doing a cleanup, and gets an exception or something that generates an interrupt, and its KILLED status gets reset to RUNNABLE

Then, in the finalizer, it starts creating a Thread, but at this point the main thread actually does get killed, and when that finalizer thread tries to run it runs into a null reference?

I can provide the Rails sample project if needs be.

Sorry if any of the above isn't clear; I've been staring at the C code for several hours and am a bit cross-eyed!

Thank you for any insights.





-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>