Darrin Thompson wrote: > I'm running a fairly complicated build and test system with DRb over > Ruby 1.8.6. It involves 12 Linux machines running several different > distro versions and one Windows machine. > > Lately I've been having problems where once in awhile the machines > involved in this system just stop communicating, and I can't figure out > why. I've found on occasion I can work around the problem by changing > the order of the operations or the frequency of them. It's more or less > random when it occurs. > > The only thing I can think of is that this all started when I added suse > 9.3 and 9.4 machines to this system. > > The other possibility is that now I have 12 Linux machines and a Windows > machine all more or less arbitrarily talking with each other, so there > might be a slowly increasing probability of a deadlock that I'm suddenly > noticing because it's more likely with more machines. > > I'm sitting here thinking of exotic ways TCP could be misconfigured out > of the box on suse 9. But deep in my soul I'm sure it's some stupid code > I wrote. > > Anyway, the idea here is that a Windows machine sends messages to > several Linux machines and the Linux machines send back log messages and > occasionally a series of messages that represent the contents of a file. > > If anyone has insight, I'd appreciate it. I'm running out of good ideas > here. > > -- > Darrin It might help to add Thread.abort_on_exception = true in case a drb thread is dying silently. (DRb might be smarter than that, though.) -- vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407