I have some final results on the problem described.

First, I must correct something on my initial post; I had stated that 
there was an occasional high delay between the moment of the 'select' 
call and the 'select' return (ie, although the timeout set in the select 
was 50msec, the delay could be of 5 seconds).

Actually, tracing across all calls in that section of code, the delay 
occurs between the select return and the 'recvfrom'. The details of what 
follows may be of interest to anyone using Ruby for fast communication.
Test environment:
- pure Ruby 1.9.2 (no gems, just the 'socket' library) on an ubuntu 
machine (lots of memory)
- Ruby sends 4 Udp msgs per second to a micro-controller
- The micro (C/assembler) responds (Udp) within a 10-30 milliseconds 
range
- So it is 4 msgs sent and 4 responses rcvd every second

This is what I saw since midnight in one of the systems (the symbol 
'<->' means 1 msg sent and response; the symbol '!!' was inserted to 
grep all abnormal results):

# Time as Hour:Min:Sec:Msec; the 'delay_sel_rcv' (the time between 
return of 'select' and 'recvfrom') value  is in Seconds

# log from midnight; all perfect until 1:21 am

01:21:19:914: <->: !! delay_sel_rcv=10.006525661
01:21:29:928: <->: !! delay_sel_rcv=10.010217133
01:21:39:937: <->: !! delay_sel_rcv=10.004327574
01:21:49:954: <->: !! delay_sel_rcv=10.011541082
01:21:59:972: <->: !! delay_sel_rcv=10.005877574
01:22:05:973: <->: !! delay_sel_rcv= 5.998151639

# then all ok unti:

02:22:27:374: <->: !! delay_sel_rcv=10.008022384
02:22:37:394: <->: !! delay_sel_rcv=10.008430684
02:22:47:401: <->: !! delay_sel_rcv=10.004019076
02:22:57:409: <->: !! delay_sel_rcv=10.005836859
02:23:07:580: <->: !! delay_sel_rcv=10.008476556
02:23:17:610: <->: !! delay_sel_rcv=10.007506338
02:23:27:642: <->: !! delay_sel_rcv=10.007311141
02:23:37:655: <->: !! delay_sel_rcv=10.008225368
02:23:47:685: <->: !! delay_sel_rcv=10.018187389

# then all ok until
04:24:08:873: <->: !! delay_sel_rcv=10.006355125
...

We can see from the above:

- the first 80 minutes (from midnight to 01:21) went fine
- then we see several delays of 10 seconds, in the same minute (each 10 
seconds apart from the other)
- for 1 hour all was pefect again, exchanging some 12,000 messages with 
perfect timing
- then we have 9 delays of 10 seconds (again separated by 10 seconds)
- for 1 hour all went fine again; then the cycle repeats

This pattern can only indicate (in my view) the garbage collector, which 
Ruby seems to run for 10 seconds several times in the same minute or so.
I could not put the calls to GC.disable/enable (to have the final 
proof), around the select/recvfrom (not to interfere with a real 
experiment that was moving heavy equipment). Notice that, if it is the 
GC, disabling/enabling it will only shift the problem from one area of 
the communication handler to another (and thus having a similar impact 
on the applications using the comm handler).

Interestingly, this problem does not happen within 1 computer; I used 
the identical Ruby program but replacing the Firmware with a Ruby 
simulator (with same machines, same Udp and the same binary strings 
exchanged); in a test of 10 hours, I only saw occasional "delays" betwen 
select and recvfrom, but in the order of 100 milliseconds, and never of 
5/10 seconds.
This would seem to indicate an inefficiency in the Udp stack (when used 
across computers).

My conclusion is that if you want a predictable delay (with values 
spread across a 'tight bell' curve, not just increasing the timeout to 
cope with 'everything'), you must use (for that section of the software) 
a compiled language; at least until the technology of garbage collector 
changes.

I hope that this is useful to others who use Ruby for high speed 
communication (and the ones working on garbage collectors).
--

Last note: one year ago I met in a party a JPL engineer working on the 
Mars exploration program; he was admirative of Ruby, but after some 
jokes on the expressivity and beauty in old and new languages, he added 
that they would never use scripting languages because "we don't want the 
garbage collector to enter in action just when we should to begin to 
slow down the spacecraft near Mars and miss the landing! in fact, we 
don't even use C++, as we did not find it totally predictable; so we 
will still use C for years to come".

I never knew how well I would learn to appreciate his point

Raul Parolari

raulparolari / gmail.com

-- 
Posted via http://www.ruby-forum.com/.