Issue #14867 has been updated by k0kubun (Takashi Kokubun).


> I think we'll have to support non-blocking/event-based waitpid in Windows for auto-fiber/Thread::Light, anyways. So making MJIT worker thread be a Ruby Thread (or Thread::Light) would allow it to hook into process.c APIs. I don't think the existing code is too far off from being able to support Windows...

I see. Dropping MJIT worker by non-blocking/event-based waitpid would be an option once it's supported for auto-fiber/Thread::Light. At this moment, I'm not sure which is larger, synchronization cost between threads or translating ISeq to C code synchronously. So in a very short term, I wanna fix the 1 (waitpid) case with postponed_job unless it has a critical blocker to fix the issue. 

Note that the current synchronization design is already achieving the better benchmark result than one by copying inline caches on enqueueing an ISeq. To simplify the situation by stop using postponed_job, I experimented to have another global variable for the synchronization job which is to be checked and invoked by `RUBY_VM_CHECK_INTS`, but the benchmark result was not good. So for now using postponed_job is the only possible approach that achieves the current performance.

> rb_postponed_job_register only sets a flag, but doesn't wake up sleeping the thread in 1. or 2. by calling ubf.func (via rb_threadptr_interrupt).
>
> This is a tricky situation...
>
> Calling ubf.func is NOT async-signal-safe, so rb_postponed_job_register may not use it by default, either.

I'm not still understanding why this could be related to the deadlock 1 (waitpid on #system, #`, or Process.waitpid). I assume that the sleeping threads "1." and "2." should be waken up by something prepared in threads "1." and "2.", not by MJIT worker thread "3.". So the fact that rb_postponed_job_register (called by MJIT worker thread "3.") may not wakeup the thread "1." or "2." should not be the cause of this deadlock, in my understanding.

I actually don't understand how threads "1." and "2." are usually waken up. For waitpid thread "2.", I'm guessing that a signal handler (of SIGCHLD?) registered by the thread "2." itself somehow *replaces* native_ppoll_sleep and finishes it, but not sure.

> Also ruby_current_execution_context_ptr variable is unstable between setting ec->interrupt_flag (via RUBY_VM_SET_POSTPONED_JOB_INTERRUPT) and ubf.func calls since we make them without GVL
>
> This is a similar situation to [Bug #14939] r64062

I recognize this issue. For this issue, @ko1 suggested to repeatedly calling `rb_postponed_job_register_one` (that registers the same job at most only once) to survive the ec switches. I can fix this issue, but I'm NOT thinking that this could be the solution for this deadlock if unblocking thread "3." waiting for postponed job may not wake up threads "1." and "2." as I understand.

I *assume* that threads "1." and "2." are not waken up because something that should wake up "1." and "2." is not working somehow with MJIT enabled. At first @ko1 guessed that `rb_postponed_job_register` swaps `ec->interrupt_flag` in a bad way and a flag for the signal handler for waitpid is skipped to be set, but actually we're always using `ATOMIC_OR` for modifying `ec->interrupt_flag`, and so it may not be the cause either.

----------------------------------------
Bug #14867: Process.wait can wait for MJIT compiler process
https://bugs.ruby-lang.org/issues/14867#change-74636

* Author: k0kubun (Takashi Kokubun)
* Status: Assigned
* Priority: Normal
* Assignee: k0kubun (Takashi Kokubun)
* Target version: 
* ruby -v: 
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
If Ruby tries to wait for any child process, MJIT's gcc/clang process could be caught by the method call. It's not convenient for both Ruby's user and MJIT worker thread, so Process.wait and its families should somehow avoid waiting for it.

---Files--------------------------------
0001-hijack-SIGCHLD-handler-for-internal-use.patch (13.8 KB)
JIT-test-all.log (39.9 KB)
mjit_test-all_63796.log (40.4 KB)
config_ruby-loco_mingw.log (27 KB)
test_jit_results.txt (41.2 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>