Feature #3176: Thread#priority= should actually do something
http://redmine.ruby-lang.org/issues/show/3176

Author: caleb clausen
Status: Open, Priority: Normal
Category: core

Currently, Thread#priority= doesn't seem to do anything useful. See #1169 for more discussion of this. Here's a short program which demonstrates Thread#priority= not working, adapted from one in that issue:

$ cat thrprio.rb 
c1 = c2 = 0
go=nil
t2 = Thread.new { 1 until go; loop { c2 += 1 } }
t2.priority = -2
t1 = Thread.new { 1 until go; loop { c1 += 1 } }
t1.priority = -1
go=true
sleep 5
t1.kill
t2.kill
puts "#{c1} #{c2} #{(c1-c2).to_f/(c1+c2)} #{c1 > c2}"

$ ruby thrprio.rb 
17102855 17276166 -0.005041184855147562 false
$ ruby thrprio.rb 
16839456 16977800 -0.0040909291989864585 false
$ ruby thrprio.rb 
17063114 17248978 -0.0054168658675781125 false
$ ruby thrprio.rb 
16809137 17019727 -0.006225157309450296 false

When I run it, the 2 counts have very nearly the same value... to within <1%. I see c1 < c2 almost always. Occasionally, I see c2 > c2. If thread priorities are working, c1 should be much larger than c2. (In ruby 1.8, in which Thread#priority= works, c1 is several orders of magnitude larger than c2.) 

Depending on your OS and what order the threads are started in, you may see the reverse situation, where c1 > c2, but occasionally c1 < c2. This is still wrong, tho. The difference between c1 and c2 should be very large.

In the discussion below, GIL means the global interpreter lock (global_vm_lock).

Also, keep in mind that every mutex has inside it somewhere a queue which holds a list of the threads waiting for the mutex.

I have a theory that there's an interaction between the scheduler and the GIL which causes priorities to be effectively ignored. Imagine this:

There are 2 threads running, A and B. A.priority > B.priority.
Initially, A runs (so it holds GIL).
B tries to run, but has to wait on GIL. 
After a short time, the scheduler forces a context switch.
So now A unlocks GIL, planning to immediately lock it again. This yields time to other threads.
But before A can lock GIL, B is scheduled (since it was at the head of the queue waiting for the GIL). 
Now B owns GIL.
A tries to lock GIL, but can't obtain it, so it waits for it to be available.
After another timeslice, B yields time back to A in the same way.
...and so on

So, A and B are effectively alternating timeslices, and each gets roughly equal amounts of time. Even tho A should have a higher priority, and should get much more time. If the OS uses priority queues rather than normal fifo queues for the internal queue inside a mutex, then there would be no problem. However, I strongly suspect that most operating systems use fifo queues within their mutex implementations.

So far, this is a theory. I have no certain proof. I have poked around in the pthreads implementation inside glibc, and concluded that mutexes in glibc do seem to use fifo queues (not priority queues) internally.

I had assumed that thread_timer() in thread_pthread.c is what causes thread switches to happen... however, as I look at it more closely, I now suspect that that is not the case. I don't know where context switches are going on... but I still think the overall theory is correct.

If I'm right, then this might be fixable by making timeslices variable length, instead of always 10 ms. Lower priority threads would get shorter timeslices. Alternatively, high priority threads could get several timeslices in a row, and lower priority threads only one. There may well be even better ways to address the problem.

Yusuke, I've tried to make my explanation as clear as possible. Please let me know if it's still hard to understand.


----------------------------------------
http://redmine.ruby-lang.org