Issue #14867 has been updated by k0kubun (Takashi Kokubun).

Assignee changed from k0kubun (Takashi Kokubun) to normalperson (Eric Wong)

Today I took a deeper look at how rb_waitpid is currently working. After reading that, while I couldn't exactly figure out which part is wrong, my bets are:

* There's a race condition around some of SIGCHLD by gcc/clang process spawn by MJIT, rb_waitpid, signal handler for SIGCHLD, `RUBY_VM_CHECK_INTS`, and mjit_worker.c's `exec_process` (especially around `vm->waitpid_lock` and `ruby_waitpid_locked`), which prevents the main thread in `native_ppoll_sleep` from waking up forever.
* That had been *hidden* until MJIT postponed_job was introduced and MJIT stopped to spuriously wake up the main thread by SIGCHLD while the main thread is in `native_ppoll_sleep`.

Given that, your description

> Following how mjit_worker.c currently works, rb_f_system now ensures the VM-wide waitpid lists is locked before creating a new process via fork/vfork.

and the patch that locks `waitpid_lock` before `fork` made sense to me. 

While I'm not sure in what kind of actual steps with thread interleaving rb_waitpid callers could steal work, your patch seems to improve the situation. Let's commit that and see what happens to trunk-mjit and trunk-mjit-wait on ko1's CIs, at least for failing inside `rb_f_system`.

The recent failure is http://ci.rvm.jp/results/trunk-mjit@silicon-docker/1435576, which hangs on `Process.wait2` for pid created by `Kernel#spawn` in `EnvUtil.invoke_ruby` and appears more frequently than #system one. For this case, I guess changes similar to `rb_f_system` (creating and passing waitpid_state) are needed for `rb_f_spawn` (and all related families that fire `rb_execarg_spawn`) as well?

By the way, thanks to take a look at this. It would take a lot of time if I tried to resolve this alone.

----------------------------------------
Bug #14867: Process.wait can wait for MJIT compiler process
https://bugs.ruby-lang.org/issues/14867#change-74658

* Author: k0kubun (Takashi Kokubun)
* Status: Assigned
* Priority: Normal
* Assignee: normalperson (Eric Wong)
* Target version: 
* ruby -v: 
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
If Ruby tries to wait for any child process, MJIT's gcc/clang process could be caught by the method call. It's not convenient for both Ruby's user and MJIT worker thread, so Process.wait and its families should somehow avoid waiting for it.

---Files--------------------------------
0001-hijack-SIGCHLD-handler-for-internal-use.patch (13.8 KB)
JIT-test-all.log (39.9 KB)
mjit_test-all_63796.log (40.4 KB)
config_ruby-loco_mingw.log (27 KB)
test_jit_results.txt (41.2 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>