Hi !

I have been debugging this for a few days, and finally ran strace on
the process.  I'm hoping someone on 'core could help me understand this
further.

I have a program that serves clients over TCP, and polls servers over
UDP, and under ruby 1.8.4 (linux 2.6.14.3) i'm seeing frequent seemingly
random moments when the whole process appears to block.  In strace, it
always looks about like this:

22:09:44.551164 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 14
22:09:44.551231 connect(14, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("x.x.0.10")}, 28) = 0
22:09:44.551313 send(14, "\356\307\1\0\0\1\0\0\0\0\0\0\00274\003248\00219\00267\7"..., 43, 0) = 43
22:09:44.551402 gettimeofday({1138687784, 551419}, NULL) = 0
22:09:44.551462 poll([{fd=14, events=POLLIN}], 1, 5000) = 0   # ***STALLS HERE FOR 5 SEC***
22:09:49.558264 send(14, "\356\307\1\0\0\1\0\0\0\0\0\0\00274\003248\00219\00267\7"..., 43, 0) = 43

The thing is, in the course of trying to track this down (putting 
print statements in the program seemed to indicate the freezes were
occurring at random points), I had created a backtround thread that
just is in a simple loop, printing to a logfile every 0.1 seconds.

I.e:

  $bg_print_thread = Thread.new { loop{ $errlog.log("bg_print_thread"); sleep(0.1) } }
  $bg_print_thread.priority = 10

Every 0.1 seconds that should be printing to the log file, but the
whole process is getting blocked.

With strace, it's always a poll() statement showing on screen when
the process blocks, similar to:

22:09:44.551462 poll([{fd=14, events=POLLIN}], 1, 5000) = 0

Apparently that's a 5 second (5000 msec) delay?  How can I find where
that's coming from?

How could there ever be poll() with a 5000 msec delay, when I have that
sleep(0.1) loop $bg_print_thread thread?

At this point I'm guessing I need to be putting printouts in ruby itself
not my program?

Thanks for your help !

Regards,

Bill