Hello all,

I would like to first apologize in advance for posting to the core as a 
newbie, but this seems like the right place. Anyway, I'm interested in 
adding light weight asynchronous messaging to Ruby. My idea is to create 
messaging components which are composed of a thread and a queue with 
messages are sent between components via the Queue. This allows for 
significant parallelism but decreased issues thread safety issues as 
each message is processed by the component one at a time. Of course any 
shared objects would have thread safety issues, but if a component only 
operates on its own objects the issues should be significantly reduced. 
Currently we have implemented a similar system in C for a consumer audio 
product and it has worked OK, but I would like to use a higher level 
language and Ruby looks like a good. But performance is a concern, 
especially since we are using 100MHZ ARM7's.

Anyway, I've done a trial implementation and noticed that the as I 
increased the number of messaging components the number of messages per 
second was dropping. So I look into why and have done some measurements 
using the following program:

-----------------------
require 'thread'

def prt(s)
  STDOUT.print s
  STDOUT.flush
end

ts = []
que = []

count = ARGV[0].to_i
prt "#{count} threads being made\n"

count.times { |i|
  ts[i] = Thread.new {
    que[i] = Queue.new
    que[i].deq
  }
}

prt "Make background thread\n"

counter = 0

background = Thread.new {
  while true
    counter += 1
  end
}

prt "sleeping 5 seconds ..."
sleep(5)
prt "\nDone counter=#{counter}\n"

background.kill
background.join

prt "Waiting for threads to end\n"

count.times { |i|
  que[i].enq("done")
}
count.times { |i|
  ts[i].join
}
-----------------------

The program creates a number of threads which do nothing except wait for 
a message plus one background thread which increments a counter. I get 
the following results:

Cygwin 2.4ghz 1GB ram P4
ruby 1.9.0 (2005-10-23) [i386-cygwin]

   0: Done counter=13589805
  10: Done counter=13572629
 500: Done counter= 8916522
1000: Done counter= 5344578
5000: Done counter=  270010

Cygwin 2.4ghz 1GB ram P4
ruby 1.8.2 (2004-12-25) [i386-mswin32]

   0: Done counter=11302589
  10: Done counter=11211705
 500: Done counter= 6886940
1000: Done counter= 3537857
5000: Done counter=  238339

Linux amd64 3200+ 500MB ram
ruby 1.9.0 (2005-10-23) [x86_64-linux]

   0: Done counter=20029982
  10: Done counter=19431449
 500: Done counter=19475618
1000: Done counter=19004082
5000: Done counter=18612325


As you can see, with cygwin on both ruby 1.8.2 and 1.9.0 as the number 
of threads increase the value counter attains over the 5 seconds 
decreases significantly. But we see that for Linux on the amd64 it still 
decreases but much less severely. I also ran it on my 800MHZ PowerBook 
on OS X and it to exhibited the Linux behavior, although the count was 
much less, but the absolute number isn't critical (at the moment anyway).

I then investigated why and it appears to be for two reasons. The 
fundamental reason for the decreasing performance for all 
implementations appears to me to be the algorithms in 
rb_thread_schedule. In this routine the algorithm to choose the next 
thread to run looks at every thread in the system at least twice and 
possibly as many as 5 times. Thus, as the number of threads increases 
the performance decreases. But, why is this less of a problem for Linux 
than cygwin? I was guessing that the problem had to do with how often 
rb_thread_schedule was being called. I looked deeper at the code and 
background scheduling appears to be accomplished for a thread such as 
background by the rb_eval routine invoking the macro CHECK_INTS defined 
in rubysig.h.

As most of you probably know CHECK_INTS has two implementations, one 
uses a counter, rb_thread_tick, which causes rb_thread_schedule to be 
called every 500 times CHECK_INTS is invoked. The other causes 
rb_thread_schedule to be run approximately every 10ms. The counter 
technique is used by cygwin and the timer technique is used by Linux. As 
a quick test, I changed the value of THREAD_TICK from 500 to 5000 and 
sure enough the performance for cygwin increased significantly.

I then looked at the where and how HAVE_SETITIMER and _THREAD_SAFE, the 
defines that control the timer usage and found that if I added 
_THREAD_SAFE to my 1.9.0 config.h the performance on cygwin approached 
what I saw under Linux:

Cygwin 2.4ghz 1GB ram P4 with DEFINE "_THREAD_SAFE"
ruby 1.9.0 (2005-10-24) [i386-cygwin]

   0: Done counter=16975581
  10: Done counter=16741738
 500: Done counter=16621442
1000: Done counter=16333656
5000: Done counter=13899214

Soooooooo:

1) What are the issues using  _THREAD_SAFE for cygwin?

2) If I were to create a more efficient implementation of 
rb_thread_schedule, shooting for O(1), would that be interesting to anyone?

3) To do a more efficient implementation I've already taken the first 
step and split eval.c into 4 files eval.h, thread.c and thread.h, is 
anyone interested in these files? Of course they would need to be 
evaluated by some experts and probably changed, but I think refactoring 
out the threading code makes sense. Note: Evan originally suggested this 
to me when we talked at the Ruby Conference.

4) Is anyone interested in a light weight asynchronous messaging for 
Ruby? By light weight I mean the performance decrease due to message 
passing is low (use binary for marshaling) and by asynchronous I mean 
that sending a message never blocks nor returns a value. Messages are 
placed on a queue that the receiver will processes at a time of its 
choosing.

Cheers,

Wink