Tanaka Akira <akr / fsij.org> wrote:
> 2010/7/6 Eric Wong <redmine / ruby-lang.org>:
> >
> > sendfile() may return with a short write upon a client disconnect.  Instead of
> > retrying and getting an error, Ruby tries to force a select() on the descriptor
> > which fails to detect the disconnect.  This causes IO.copy_stream to hang,
> > (possibly until TCP keepalives kick in).  IO.copy_stream should raise
> > immediately.
> 
> Thank you for the reproducible script and fix.
> 
> I'll commit your fix.

Thank you for looking into this.

> However I think the Linux select behavior which doesn't notify writability on
> disconnected TCP socket is suspicious.

<snip>

> FreeBSD and Solaris notify writability.

<snip>

> I think select should notify writability when write would not block.
> Cleary write doesn't block on disconnected socket.
> 
> Linux also notify writability for UNIX domain socket pair.

<snip>

UNIX domain sockets are easy to do notification for since they're always
on the same host.  TCP might be harder to detect (and thus the Linux
folks choose not to bother at all) because the client is on a different
machine and it might lose a physical connection.

How does FreeBSD or Solaris behave if a client is on a different machine
and has the network cable pulled out?  In the case of physically
disconnected network cable, the client TCP stack has no way to notify
the server of a disconnect.  "kill -9" or even normal OS shutdown would
give the TCP stack a chance to properly shutdown the connection.

There are a few more instances of "errno = EAGAIN" assignments in io.c
that look suspicious to me.   My proposed fixes are below, but I'm
having trouble reproducing the badness I was seeing with IO.copy_stream
in these code paths:

diff --git a/io.c b/io.c
index 5129a14..108af7e 100644
--- a/io.c
+++ b/io.c
@@ -649,7 +649,7 @@ io_fflush(rb_io_t *fptr)
     if (0 <= r) {
         fptr->wbuf_off += (int)r;
         fptr->wbuf_len -= (int)r;
-        errno = EAGAIN;
+        goto retry;
     }
     if (rb_io_wait_writable(fptr->fd)) {
         rb_io_check_closed(fptr);
@@ -877,7 +877,8 @@ io_binwrite(VALUE str, rb_io_t *fptr, int nosync)
         if (0 <= r) {
             offset += r;
             n -= r;
-            errno = EAGAIN;
+            if (offset < RSTRING_LEN(str))
+                goto retry;
         }
         if (rb_io_wait_writable(fptr->fd)) {
             rb_io_check_closed(fptr);
-- 
Eric Wong