In article <493f99d22c326_87b7b03b306038 / redmine.ruby-lang.org>,
  "coderrr ." <redmine / ruby-lang.org> writes:

> two threads reading from the same socket at same time produces race condition whichs locks interpreter
>
> info at: http://coderrr.wordpress.com/2008/12/10/mri-io-deadlock/

I investigated the problem.

It is caused when rb_thread_schedule schedules a thread
which is neither the two threads.

Assume threads X and Y reads from the same socket and
another thread Z.

When X and Y blocks in READ_CHECK and data arrived,
rb_thread_schedule makes one of the thread runnable.
I assume that X is choosen here.

If rb_thread_schedule choose X for next thread, there is no
problem.  X reads from the socket without problems.

But rb_thread_schedule may choose another thread Z.
Later, rb_thread_schedule is called from Z.
This call makes Y runnable because the data for the socket
is still available since no one read it yet.

Now, X and Y are runnable.

I assume X is choosen for next thread here.

X reads from the socket without problems.
After that, X calles rb_thread_schedule.

At this point, Y is still runnable.  So
Y is chosen by rb_thread_schedule.  But the data is not
available because X read it.  So read call in Y blocks the
interpreter.

I think rb_thread_schedule should not make a fd-waiting
thread runnable unless the thread is choosen for the next
thread.

Index: eval.c
===================================================================
--- eval.c	(revision 21047)
+++ eval.c	(working copy)
@@ -10961,6 +10961,7 @@ rb_thread_schedule()
     rb_thread_t next;		/* OK */
     rb_thread_t th;
     rb_thread_t curr;
+    rb_thread_t th_found = 0;
     int found = 0;
 
     fd_set readfds;
@@ -11110,10 +11111,10 @@ rb_thread_schedule()
 	    FOREACH_THREAD_FROM(curr, th) {
 		if ((th->wait_for&WAIT_FD) && FD_ISSET(th->fd, &readfds)) {
 		    /* Wake up only one thread per fd. */
+                    if (!th_found) {
+                        th_found = th;
+                    }
 		    FD_CLR(th->fd, &readfds);
-		    th->status = THREAD_RUNNABLE;
-		    th->fd = 0;
-		    th->wait_for = 0;
 		    found = 1;
 		}
 		if ((th->wait_for&WAIT_SELECT) &&
@@ -11143,9 +11144,15 @@ rb_thread_schedule()
 	    next = th;
 	    break;
 	}
-	if (th->status == THREAD_RUNNABLE && th->stk_ptr) {
-	    if (!next || next->priority < th->priority)
-	       next = th;
+	if ((th->status == THREAD_RUNNABLE || th == th_found) && th->stk_ptr) {
+	    if (!next || next->priority < th->priority) {
+                if (th == th_found) {
+		    th_found->status = THREAD_RUNNABLE;
+		    th_found->fd = 0;
+		    th_found->wait_for = 0;
+                }
+	        next = th;
+            }
 	}
     }
     END_FOREACH_FROM(curr, th);
-- 
Tanaka Akira