Issue #7081 has been reported by stevegoobermanhill (stephen gooberman-hill).

----------------------------------------
Bug #7081: GServer orphaned threads lead to resource exhaustion
https://bugs.ruby-lang.org/issues/7081

Author: stevegoobermanhill (stephen gooberman-hill)
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: 
ruby -v: 1.9.3 and 1.8.7


Hi,
I believe I have located another underlying flaw in GServer: I have observed server threads becoming orphaned, leading to resource allocation (when @connections.size > @@maxConnections).

The cause is that the new service thread is added to @connections from the tcpServerThread, but is removed from @connections from within itself as part of the teardown mechanism. Under certain circumstances (not investigated too deeply, but I work with low power embedded platforms), it is possible that the thread scheduler allows the service thread to run to completion (or at least as far as removing itself from @connections) before the tcpServerThread adds it @connections. The result is that the service thread is never removed from @connections. Over time, this can result in @connections becoming filled with orphaned threads, stopping new connections from being accepted.

There are also some issues in GServer relating to TCPSocket shutdown: There is a TCP timeout parameter called CLOSE_WAIT (default = 7200s = 2hours). This is the maximum time that the TCP socket is kept open if the socket closing protocol aborts incorrectly. This can happen under some circumstances if both ends try to close the socket simultaneously. 

The way the GServer code is written, the service thread TCP socket is closed before the thread is removed from @connections. I am guessing, but I believe that the intention is to limit the number of actively running threads, preventing the system performance from degrading due to processor load. However, while this is implemented, it is done so in an (unintentionally) extremely conservative manner. The TCPSocket#close method can block while waiting for the CLOSE_WAIT timout, which keeps the thread alive, and stops it being removed from @connections. 

I suggest that the service TCPSocket#close method should be called after the service thread is removed from @connections. This has the effect of limiting the GServer to @@maxConnections actively serving threads (ie those that are within GServer#serve or the derrived class #serve method).  

I haven't included patches at this point, as the changes to GServer are actually quite extensive. I have a drop-in replacement class for GServer (called GenServer) that I have extensively tested (I use it as the basis for an actor model for embedded systems deployed in very remote places (think jungles, tops of mountains, etc). GenServer implements:
- changes required to avoid thread orphaning problem as explained above 
- changes required to avoid server blocking while TCPSockets are in CLOSE_WAIT timeout
- non-blocking TCPServerThread loop with an (empty) #heartbeat method callout (which allows the server to check whether it is in a good state (whatever that might mean in your implementation!)
- solves #6369
- logging implemented via Logger rather than hand-rolled
- improved error handling : GServer will attempt to retry rather than shutting down on StandardError (but will shutdown on Exception)

However, while I believe that it is a significant improvement on the existing GServer, I am not yet fully convinced that it is good enough for the standard library (there is one issue in particular around TCPServer#accept that could do with a better solution). 

Let me know if you would like the GenServer code, or a series of patches, or what.

Kind regards

Steve 
  


-- 
http://bugs.ruby-lang.org/