Lennon Day-Reynolds <rcoder / gmail.com> wrote in message news:<5d4c612404091310044fa47610 / mail.gmail.com>... > Chris, > > There are quite a few reasons this could be going wrong, both inside > and outside of your Ruby test harness. Since you said that the problem > was only reproducible on long (>1hr.) tests, though, I would suspect > socket connection (or other system-level resource) timeouts. This is what I'm suspecting too; but the weird thing is, I would expect to get something like a 0 result out of the read, or a 'connection reset by peer' or some similar business, not an Invalid Handle sort of error. The other thing that bugs me about this is that we got an error on the receive part of the connection; I don't want that! If it's gonna fail, I'd much rather it fail on the send. Is this related somehow? ie when you close the write end of a socket, you send a FIN; so if the other end ('them') has closed, it will send the FIN, immediately get the ACK (courtesy of the kernel) and go into FIN_WAIT_2, whereas the receiving end ('us') will go into CLOSE_WAIT (as it waits for your app to notice and close the socket). So then we just write the request (there's no way for the opposite end to signal that it's not going to read any more; on Unix if the opposite end is closed I generally find that I can write once, then I get an EPIPE on the second write), and then we proceed to try to read, when we get the invalid handle error. The other alternative is somehow this socket got closed but we're still using it... but how would that happen between us writing and reading? The only issue with this is that the liveness checking should have already detected that the socket was closed, ya? it does a quick select() poll on the socket to see if it was readable; this should notice if the socket was closed I would think... > > I would suggest adding a 'ping()' method to your DRb server, and then > having clients call it periodically (say, every 5-10 seconds) in a > background thread or process, as well as optionally before any call > with important data to be transferred. That way, both the client and > the server can detect connection failures before you have to worry > about losing data. Well, that's the problem; I'm not totally sure that that will actually help us out, since the connection seemed to fail in the mid-point; so maybe the ping will succeed, which is well and good, but how do I know the data traffic won't right after? Not to mention the fact that the pool may grow, and therefore the ping would get a different connection than the subsequent 'real' call... > DRb is cheap wire-level scaffolding, but it's not a reliable messaging > system; that has to be handled at the application level. Yup; I'm not expecting foolproof-ness (I'm much too ingenious) but I'm just curious how it can fail in the rather bizarre way it seems to be failing in... Thanks, Chris