eregontp / gmail.com wrote:
> normalperson (Eric Wong) wrote:
> >  "git bisect" points to r63498 ("thread_pthread.c: enable
> >  thread cache by default"), which is HIGHLY unexpected.
> 
> So it seems it doesn't happen in Ruby <= 2.5 due to thread
> caching being disabled then?

Right; but that might just be "luck"...

> >  I'm not seeing how thread caching can be the cause of the
> >  bug, but rather it makes an existing bug more apparent by
> >  being faster and hitting the race sooner.
> 
> No idea either, but I guess a bug could lurk in there.  I did
> experience a few weird bugs when doing thread caching in
> TruffleRuby (e.g., inline caches based on pthread_self() are
> incorrect as multiple Ruby threads could use the same pthread
> thread)

At least for the core VM and standard library; I don't think
pthread_self is a factor; and we already clobber our TSD
pointer since r63836

> >  What thread-caching does do is make the MSPECOPT "-R"
> >  option noticeably faster
> 
> Right, because creating threads is so much faster.  Maybe the
> fact pthread startup is done each time has some effect on
> races (e.g., the pthread thread has a perfect view of the
> caller thread when starting, but not true with caching, but
> maybe the GVL still ensure that)

Yes, it might affect accounting done in the kernel; particularly
when bound to a single CPU.  And it still takes -R1000 to
reliably reproduce the problem with a single CPU; -R500 was
sometimes successful, even.

> > I think the only way to work reliably is to disable
> > interrupt checking when attempting to lock a mutex in
> > ensure:
> 
> At least if the interrupt causes to escape the function (e.g.,
> Thread#raise/Thread#kill), then that escape must be delayed
> after re-acquiring the Mutex.

Yes, so I've committed r64476

> In normal Mutex#lock and Mutex#synchronize, it's fine to
> raise/kill before taking the Mutex, because it's before
> entering the critical section, but unfortunately here we are
> inside the critical section.

So this is why pthread_mutex_lock is forbidden from returning
EINTR by POSIX.

Mutex#lock/synchronize should have been uninterruptible, too;
but it's too late to change that as it would break compatibility
with existing code (bootstraptest/test_thread.rb already breaks).

> > it could still affect user-experience if Ctrl-C takes a while
> 
> When would that happen?
> If some other Thread holds the Mutex too long?

Yes, but I suppose it's not a real problem.

> It's probably never a good idea to hold a Mutex for seconds or
> a long time, especially when using a ConditionVariable.
> It seems fair enough to delay the interrupt in such a case, as
> the main Thread is waiting for the Mutex.

Agreed(*).

> Maybe we could warn if re-acquiring takes too long and it
> becomes a problem in practice (I think it won't).

We already have deadlock checking to alert users to real
problems, so it's probably not a problem in practice.


(*) Along those lines, I think the "lock" idiom for GVL is not
    ideal for performance (kosaki rewrote the GVL for 1.9.3 to
    optimize for contention as a result).  I might try replacing
    the GVL with a platform-independent scheduler which limits
    parallelism to the data structures the GVL currently protects.

    Note: This will NOT solve the parallelism problem which
    exists in C Ruby; that is a much bigger task (but still
    doable with a scheduler, and perhaps even more so).

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>