eregontp / gmail.com wrote:
> I think we need to give some thinking on this,
> and I don't want to stress you to fix it to fix the build.
> (although this should be solved soon IMHO, latest before the next preview/RC)

It needs to be fixed ASAP.

r64467 seems to make CI happy, at least (along with r64398,
which was an absolutely necessary fix).  The key difference with
r64467 fix is I figured out I needed to reproduce the failure
reliably by using a single CPU (schedtool -a 0x1 -e ...)

> So I propose to "tag" the spec so it won't run in CI, so you can focus on it when there is time.
> I'll do that tomorrow if you agree.

Please don't.  CI is happy at the moment, at least.

> It would be also be interesting to see if Ruby <= 2.5 actually
> ensured the semantics I described above.

I was able to reproduce the problem with the old
timer-thread using "schedtool -a 0x1".

"git bisect" points to r63498 ("thread_pthread.c: enable thread
cache by default"), which is HIGHLY unexpected.

I'm not seeing how thread caching can be the cause of the bug,
but rather it makes an existing bug more apparent by being
faster and hitting the race sooner.

The thread cache only operates at the pthreads level, so there's
no way any Ruby-level interrupt could hit across Ruby Thread
lifetimes.  pthread_kill is never used in the
Mutex/ConditionVariable code paths; and different pthreads
condvars are used for sleeping/wakeups.  So it is a mystery as
to how thread caching ends up affecting Ruby-level interrupt
handling...

What thread-caching does do is make the MSPECOPT "-R" option
noticeably faster

Fwiw, I can still reproduce the failures with low timeouts:

```
diff --git a/thread_pthread.c b/thread_pthread.c
index 2fd60ddd4a..e8da3ee9c2 100644
--- a/thread_pthread.c
+++ b/thread_pthread.c
@@ -187,7 +187,7 @@ do_gvl_timer(rb_vm_t *vm, rb_thread_t *th)
 
     if (vm->gvl.timer_err == ETIMEDOUT) {
         ts.tv_sec = 0;
-        ts.tv_nsec = TIME_QUANTUM_NSEC;
+        ts.tv_nsec = 1;
         ts = native_cond_timeout(&nd->cond.gvlq, ts);
     }
     vm->gvl.timer = th;
```

So there's definitely more to investigate until I'm satisified
with the reliability of this (but I'm too tired, now).

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>