2010/7/6 Eric Wong <normalperson / yhbt.net>:

> UNIX domain sockets are easy to do notification for since they're always
> on the same host.  TCP might be harder to detect (and thus the Linux
> folks choose not to bother at all) because the client is on a different
> machine and it might lose a physical connection.

If the kernel cannot detect disconnect, how the kernel causes EPIPE?

> How does FreeBSD or Solaris behave if a client is on a different machine
> and has the network cable pulled out?  In the case of physically
> disconnected network cable, the client TCP stack has no way to notify
> the server of a disconnect.  "kill -9" or even normal OS shutdown would
> give the TCP stack a chance to properly shutdown the connection.

I don't say about such physical disconnection.

I described about the situation that the kernel knows the connection is
disconnected.

The connection is disconnected by RST packet.
The RST packet is generated by a normal packet is sent to closed port.

  % ruby -rsocket -e '
  def netstat
    s = `netstat -n`
    s.each_line {|line| puts line if /State\s*$|127.0.0.1:8888/ =~ line }
    puts
  end
  serv = TCPServer.open("127.0.0.1", 8888)
  s1 = TCPSocket.open("127.0.0.1", 8888)
  s2 = serv.accept
  netstat
  s2.close
  netstat
  s1.write "a" rescue p $!
  netstat
  s1.write "a" rescue p $!
  p IO.select(nil, [s1], nil, 0)
  '
  Proto Recv-Q Send-Q Local Address           Foreign Address
State
  tcp        0      0 127.0.0.1:8888          127.0.0.1:34516
ESTABLISHED
  tcp        0      0 127.0.0.1:34516         127.0.0.1:8888
ESTABLISHED

  Proto Recv-Q Send-Q Local Address           Foreign Address
State
  tcp        0      0 127.0.0.1:8888          127.0.0.1:34516
FIN_WAIT2
  tcp        1      0 127.0.0.1:34516         127.0.0.1:8888
CLOSE_WAIT

  Proto Recv-Q Send-Q Local Address           Foreign Address
State

  #<Errno::EPIPE: Broken pipe>
  nil

When first netstat call, the TCP states of
s1 (the local address is 127.0.0.1:8888) and
s2 (the local address is 127.0.0.1:34516) are ESTABLISHED.

s2.close sends a FIN packet to s1.
s1 receives it and send an ACK packet to s2.
This changes s1 to FIN_WAIT_2 and s2 to CLOSE_WAIT.

The first s1.write "a" sends a normal data packet to s2.
Since the write system call doesn't wait the result of the packet,
the system call itself succeeds.
But s2 is CLOSE_WAIT and no data acceptable.
So s2 sends back a RST packet to s1 and change state of s2 to CLOSED.
Then s1 receives the RST packet.  It changes the state of s1 to CLOSED.

The second s1.write "a" fails with EPIPE.
This is because the kernel knows s1 is CLOSED.

Now the kernel knows write() for s1 doesn't block.
(It causes an error immediately)
So FreeBSD and Solaris notify it with select().
But Linux doesn't.
I think it is a problem of Linux.
-- 
Tanaka Akira