Issue #14999 has been updated by Eregon (Benoit Daloze).


normalperson (Eric Wong) wrote:
> r64467 seems to make CI happy, at least

Great!

> I figured out I needed to reproduce the failure
>  reliably by using a single CPU (schedtool -a 0x1 -e ...)

I could also verify it fails before your commit, and passes after with:

    taskset -c 1 mspec-run -R1000 library/conditionvariable/wait_spec.rb
  
>  "git bisect" points to r63498 ("thread_pthread.c: enable thread
>  cache by default"), which is HIGHLY unexpected.

So it seems it doesn't happen in Ruby <= 2.5 due to thread caching being disabled then?
 
>  I'm not seeing how thread caching can be the cause of the bug,
>  but rather it makes an existing bug more apparent by being
>  faster and hitting the race sooner.

No idea either, but I guess a bug could lurk in there.
I did experience a few weird bugs when doing thread caching in TruffleRuby
(e.g., inline caches based on pthread_self() are incorrect as multiple Ruby threads could use the same pthread thread)

>  What thread-caching does do is make the MSPECOPT "-R" option
>  noticeably faster

Right, because creating threads is so much faster.
Maybe the fact pthread startup is done each time has some effect on races
(e.g., the pthread thread has a perfect view of the caller thread when starting,
but not true with caching, but maybe the GVL still ensure that)

> I think the only way to work reliably is to disable interrupt
> checking when attempting to lock a mutex in ensure:

At least if the interrupt causes to escape the function (e.g., Thread#raise/Thread#kill),
then that escape must be delayed after re-acquiring the Mutex.

In normal Mutex#lock and Mutex#synchronize, it's fine to raise/kill before taking the Mutex,
because it's before entering the critical section, but unfortunately here we are inside the critical section.

> it could still affect user-experience if Ctrl-C takes a while

When would that happen?
If some other Thread holds the Mutex too long?
It's probably never a good idea to hold a Mutex for seconds or a long time, especially when using a ConditionVariable.
It seems fair enough to delay the interrupt in such a case, as the main Thread is waiting for the Mutex.
Maybe we could warn if re-acquiring takes too long and it becomes a problem in practice (I think it won't).

----------------------------------------
Bug #14999: ConditionVariable doesn't reacquire the Mutex if Thread#kill-ed
https://bugs.ruby-lang.org/issues/14999#change-73615

* Author: Eregon (Benoit Daloze)
* Status: Closed
* Priority: Normal
* Assignee: normalperson (Eric Wong)
* Target version: 
* ruby -v: ruby 2.6.0dev (2018-08-16 trunk 64394) [x86_64-linux]
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
These specs reproduce the issue:
https://raw.githubusercontent.com/oracle/truffleruby/master/spec/ruby/library/conditionvariable/wait_spec.rb

I can commit them, but of course they will fail.
Feel free to just copy it to MRI's spec/ruby/library/conditionvariable/wait_spec.rb
(I would do it when synchronizing specs anyway)

I'd guess it's caused by the recent timer thread changes or so.
This spec works fine on 2.5.1 and older.

Just copy-pasting the spec out allows for a standalone reproducer:

~~~ ruby
m = Mutex.new
cv = ConditionVariable.new
in_synchronize = false
owned = nil

th = Thread.new do
  m.synchronize do
    in_synchronize = true
    begin
      cv.wait(m)
    ensure
      owned = m.owned?
      $stderr.puts "\nThe Thread doesn't own the Mutex!" unless owned
    end
  end
end

# wait for m to acquire the mutex
Thread.pass until in_synchronize
# wait until th is sleeping (ie waiting)
Thread.pass while th.status and th.status != "sleep"

m.synchronize {
  cv.signal
  # Wait that the thread is blocked on acquiring the Mutex
  sleep 0.001
  # Kill the thread, yet the thread should first acquire the Mutex before going on
  th.kill
}

th.join
~~~

Result on trunk:
~~~
The Thread doesn't own the Mutex!
#<Thread:0x0000565216f4ba28 / cv_bug.rb:6 aborting> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
Traceback (most recent call last):
	1: from cv_bug.rb:7:in `block in <main>'
cv_bug.rb:7:in `synchronize': Attempt to unlock a mutex which is not locked (ThreadError)
~~~



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>