I recently was involved in trying to get a multi-threaded ruby motion
control application to handle a SIGINT trap reliably.  After digging
around a bit in eval.c, it became clear that:

   1)  Ruby trap handlers always run on Thread.main

   2)  Ruby trap handlers may be entered even if Thread.critical is set.

Most of my trouble stemmed from not realizing that traps always ran on
Thread.main.  Once I was explicit about the thread(s) in which the trap
handler raised exceptions, everything become _MUCH_ more predicable.
In retrospect, I suppose this behavior should have been obvious.
Sigh.

(Is the fact the traps run on Thread.main documented anywhere?)

But, I could not find a way to script around the fact that Ruby trap
handlers may interrupt Ruby threads in critical sections.  I suppose
that the intent here was to insure ruby scripts don't "lock up" in a
poorly coded critical section.  However, if one can abort a
critical section, how can one recover from that interruption and
continue processing?  Strictly speaking, once a critical section
is aborted, one must assume that the data that section manipulates are 
corrupted.

Unless I am missing something fundamental here, to make Thread.critical
work reliably, one must modify the Ruby interpreter such that no
asynchronous event can interrupt a ruby Thread while Thread.critical is
set. This means that a thread that spins in a critical section can cause 
Ruby to cease responding to traps. I feel this is a small price to pay 
for the ability to handle traps deterministically.

Another problem I noticed while poking around in the code was that
rb_thread_schedule() is called in a couple places without first checking
whether rb_thread_critical is set.  For instance, Thread.pass from a
critical thread can cause the woken thread to become critical.
This cannot be the desired behavior!

(Or, again, am I missing something?)

Thread.pass _should_ behave like Thread.stop.  The calling thread clears
Thread.critical and reschedules in one atomic operation.

Even single threaded scripts may need to shield themselves against
taking exceptions at inappropriate times due to traps.  The 'C' code 
does this by setting rb_prohibit_interrupt, but I cannot find any 
analogous ruby method for this.  So, these patches also prohibit trap 
evaluation when the Thread is critical.

I don't think any of these changes should affect sanely designed
multi-threading scripts.  After all, any such scripts that rely on the 
current behavior are working due, mostly, to luck ;-)  Specifically,
they are relying on the order in which rb_thread_schedule() wakes
threads.

These patches are against the 1.6.8 release.
I believe none of this has been changed in the 1.8.x series,
but I have not looked closely.  I hope those of you running
the 1.8.x series will let me know.

This patch incorporates the following patch originally described in 
ruby-talk/124508.  I'm incorporating it into this mail because the 
unpatched readline extension also calls rb_thread_schedule 
unconditionally, and therefore breaks Thread.critical sections:


The Readline uses the the GNU readline library's event handling hook
to try to ensure that other ruby threads do not block while it is
awaiting user keystrokes from STDIN.  I assert that the use of the event
hook for this purpose is unnecessary and, furthermore, the current
implementation can set the Thread.critical flag in other threads.

The current threadling_event() hook function is:

static int
readline_event()
{
    CHECK_INTS;
    rb_thread_schedule();  //bug here if Thread.critical !!!
    return 0;
}

If Thread.critical is set when readline is invoked, readline_event()
will still schedule any other thread that becomes ready.  This can
cause another thread, that was not in a critical section, to interrupt
one that was.  So, now we have a thread becoming "critical"
asynchronously.  This leads to some very weird lockups -- some of which
have been reported on this mailing list.

I think a quick fix might be to remove the unconditional
rb_thread_schedule() call from readline_event.  But, a better
fix is to remove the readline_event() function entirely.

So, how will we avoid blocking other threads during keyboard reads?
Replace readline_event() with this:

static int
readline_getc (ignored)
    FILE *ignored;
{
    VALUE string = rb_funcall (rb_stdin, rb_intern("sysread"),
						1, INT2FIX(1));
    return RSTRING(string)->ptr[0];  //single byte read
}


Thus, we redefine the function that GNU readline uses to read the input 
stream to be one based on Ruby's IO#sysread and let Ruby's own IO class
deal with all its threading nastyness.

This has been tested (fairly extensively) in our own multi-threading
motion control application under Ruby 1.6.8.  The same bug appears
to exist in the 1.8.x series, although I have not tried
this patch against 1.8.x.  I'm hoping someone on the list whose
running 1.8.x (everyone?) will test this.



Please let me know what you think of all this.

   Thanks.

-- Brent Roman


Patch from 1.6.8-release to 1.6.8.mbari follows:
---------------------------

diff -Ndaur ruby-1.6.8-release/eval.c ruby-1.6.8-mbari/eval.c
--- ruby-1.6.8-release/eval.c	Mon Dec 16 07:34:22 2002
+++ ruby-1.6.8-mbari/eval.c	Mon Dec 27 19:36:54 2004
@@ -2,8 +2,8 @@

    eval.c -

-  $Author: nobu $
-  $Date: 2002/12/16 06:50:43 $
+  $Author: brent $
+  $Date: 2004/12/20 05:17:18 $
    created at: Thu Jun 10 14:22:17 JST 1993

    Copyright (C) 1993-2001 Yukihiro Matsumoto
@@ -5362,7 +5362,6 @@
  		    return Qtrue;
  		}
  		CHECK_INTS;
-		rb_thread_schedule();
  	    }
  	}
      }
@@ -6971,6 +6970,7 @@
      return body;
  }

+
  void
  Init_Proc()
  {
@@ -7703,7 +7703,7 @@
      int need_select = 0;
      int select_timeout = 0;

-    rb_thread_pending = 0;
+    rb_thread_pending = rb_thread_critical = 0;
      if (curr_thread == curr_thread->next
  	&& curr_thread->status == THREAD_RUNNABLE)
  	return;
@@ -8228,7 +8228,6 @@
  {
      enum thread_status last_status = THREAD_RUNNABLE;

-    rb_thread_critical = 0;
      if (curr_thread == curr_thread->next) {
  	rb_raise(rb_eThreadError, "stopping only thread\n\tnote: use sleep to 
stop forever");
      }
diff -Ndaur ruby-1.6.8-release/ext/readline/readline.c 
ruby-1.6.8-mbari/ext/readline/readline.c
--- ruby-1.6.8-release/ext/readline/readline.c	Thu Jul 11 08:22:58 2002
+++ ruby-1.6.8-mbari/ext/readline/readline.c	Mon Dec 27 19:32:13 2004
@@ -8,6 +8,7 @@

  #include "ruby.h"
  #include "rubysig.h"
+#include "rubyio.h"

  static VALUE mReadline;

@@ -23,13 +24,14 @@
  #endif

  static int
-readline_event()
+readline_getc (ignored)
+  FILE *ignored;
  {
-    CHECK_INTS;
-    rb_thread_schedule();
-    return 0;
+  VALUE string = rb_funcall (rb_stdin, rb_intern("sysread"), 1, 
INT2FIX(1));
+  return RSTRING(string)->ptr[0];  //single byte read
  }

+
  static VALUE
  readline_readline(argc, argv, self)
      int argc;
@@ -461,6 +463,6 @@

      rl_attempted_completion_function
  	= (CPPFunction *) readline_attempted_completion_function;
-    rl_event_hook = readline_event;
+    rl_getc_function = readline_getc;
      rl_clear_signals();
  }
diff -Ndaur ruby-1.6.8-release/regex.c ruby-1.6.8-mbari/regex.c
--- ruby-1.6.8-release/regex.c	Tue Nov 19 13:36:29 2002
+++ ruby-1.6.8-mbari/regex.c	Mon Dec 27 19:31:13 2004
@@ -65,11 +65,11 @@
  #include "defines.h"

  # define RUBY
-extern int rb_prohibit_interrupt;
+extern int rb_prohibit_interrupt, rb_thread_critical;
  extern int rb_trap_pending;
  void rb_trap_exec _((void));

-# define CHECK_INTS if (!rb_prohibit_interrupt) {\
+# define CHECK_INTS if (!(rb_prohibit_interrupt | rb_thread_critical)) {\
      if (rb_trap_pending) rb_trap_exec();\
  }

diff -Ndaur ruby-1.6.8-release/rubysig.h ruby-1.6.8-mbari/rubysig.h
--- ruby-1.6.8-release/rubysig.h	Wed Feb 27 04:50:30 2002
+++ ruby-1.6.8-mbari/rubysig.h	Mon Dec 27 19:31:25 2004
@@ -2,8 +2,8 @@

    rubysig.h -

-  $Author: matz $
-  $Date: 2002/02/27 04:50:30 $
+  $Author: brent $
+  $Date: 2004/12/20 05:18:43 $
    created at: Wed Aug 16 01:15:38 JST 1995

    Copyright (C) 1993-2000 Yukihiro Matsumoto
@@ -66,22 +66,20 @@
  void rb_thread_schedule _((void));
  #if defined(HAVE_SETITIMER) && !defined(__BOW__)
  EXTERN int rb_thread_pending;
-# define CHECK_INTS if (!rb_prohibit_interrupt) {\
+# define CHECK_INTS if (!(rb_prohibit_interrupt | rb_thread_critical)) {\
+    if (rb_thread_pending) rb_thread_schedule();\
      if (rb_trap_pending) rb_trap_exec();\
-    if (rb_thread_pending && !rb_thread_critical) rb_thread_schedule();\
  }
  #else
  /* pseudo preemptive thread switching */
  EXTERN int rb_thread_tick;
  #define THREAD_TICK 500
-#define CHECK_INTS if (!rb_prohibit_interrupt) {\
-    if (rb_trap_pending) rb_trap_exec();\
-    if (!rb_thread_critical) {\
+#define CHECK_INTS if (!rb_prohibit_interrupt && !rb_thread_critical) {\
  	if (rb_thread_tick-- <= 0) {\
  	    rb_thread_tick = THREAD_TICK;\
  	    rb_thread_schedule();\
  	}\
-    }\
+    if (rb_trap_pending) rb_trap_exec();\
  }
  #endif

diff -Ndaur ruby-1.6.8-release/version.h ruby-1.6.8-mbari/version.h
--- ruby-1.6.8-release/version.h	Tue Dec 24 08:30:41 2002
+++ ruby-1.6.8-mbari/version.h	Mon Dec 27 19:31:41 2004
@@ -1,11 +1,11 @@
  #define RUBY_VERSION "1.6.8"
-#define RUBY_RELEASE_DATE "2002-12-24"
+#define RUBY_RELEASE_DATE "2004-12-17"
  #define RUBY_VERSION_CODE 168
-#define RUBY_RELEASE_CODE 20021224
+#define RUBY_RELEASE_CODE 20041217

  #define RUBY_VERSION_MAJOR 1
  #define RUBY_VERSION_MINOR 6
  #define RUBY_VERSION_TEENY 8
-#define RUBY_RELEASE_YEAR 2002
+#define RUBY_RELEASE_YEAR 2004
  #define RUBY_RELEASE_MONTH 12
-#define RUBY_RELEASE_DAY 24
+#define RUBY_RELEASE_DAY 17