On top of the memory leak issue, I have been trying to track down unhandled 
exceptions in my code.  I have run across a very strange behavior that I will 
try and explain.

Problem(?) code (line numbers from bsn_a.rb)

  141  def alive?
  142      t = TCPSocket.new(@host, @port)
  143      return true
  144      
  145    rescue Errno::ETIMEDOUT
  146        @exception = " Timed out (#{@host}:#{@port})"
  147    rescue SocketError => e
  148        @exception = " Socket error - #{e}"
  149    rescue Exception => e
  150      @exception = e
  151      return false
  152  end

So, in a test driver, it all works as expected with junk data:

 11:42 (kant)$ ruby test.rb
  .. trying to go to Foo (Foo:10.10.10.5:foo:bar)
  -->  failed  Foo:10.10.10.5:foo:bar
  .. trying to go to Foo (Bar:10.10.10.6:foo:bar)
  -->  failed  Bar:10.10.10.6:foo:bar

However, in my actual program, something really bizarre happens:

 11:43 (kant)$ ruby healthcollect.rb -g -n eeua.txt -c flow.txt -d data
  .. Running 10 commands on 2 nodes.
  .. Data going into directory --> data/20050210_1145_eeua
  .. processing the nodes... (thread count=35)
  .. threading now ...
  .. trying to go to Foo (Foo:10.10.10.5:foo:bar)
  Exception `SocketError' at ./bsn_a.rb:142 - getaddrinfo: hostname nor
  servname provided, or not known
  .. trying to go to Bar (Bar:10.10.10.6:foo:bar)
  Exception `SocketError' at ./bsn_a.rb:142 - getaddrinfo: hostname nor
  servname provided, or not known
  Exception `SocketError' at /usr/local/lib/ruby/1.8/net/telnet.rb:352 -  
   getaddrinfo: hostname nor servname provided, or not known
  Exception `SocketError' at /usr/local/lib/ruby/1.8/net/telnet.rb:352 -
   getaddrinfo: hostname nor servname provided, or not known
  Exception `SocketError' at /usr/local/lib/ruby/1.8/net/telnet.rb:360 -
   getaddrinfo: hostname nor servname provided, or not known
  -->  failed  Foo:10.10.10.5:foo:bar
  Exception `SocketError' at /usr/local/lib/ruby/1.8/net/telnet.rb:360 -
   getaddrinfo: hostname nor servname provided, or not known
  -->  failed  Bar:10.10.10.6:foo:bar

Telnet is throwing a 'SocketError' and line 142 is throwing one, and neither 
are being caught!

Now, if I comment out 147-148, I get the following from the program:

 11:44 (kant)$ ruby healthcollect.rb -g -n eeua.txt -c flow.txt -d data
  .. Running 10 commands on 2 nodes.
  .. Data going into directory --> data/20050210_1144_eeua
  .. processing the nodes... (thread count=35)
  .. threading now ...
  .. trying to go to Foo (Foo:10.10.10.5:foo:bar)
  Exception `SocketError' at ./bsn_a.rb:142 - getaddrinfo: hostname nor
  servname provided, or not known
  .. trying to go to Bar (Bar:10.10.10.6:foo:bar)
  Exception `SocketError' at ./bsn_a.rb:142 - getaddrinfo: hostname nor
  servname provided, or not known
  -->  failed  Bar:10.10.10.6:foo:bar
  -->  failed  Foo:10.10.10.5:foo:bar

So, it throws the exception at line 142, but Telnet exception goes away!?!

Can anyone shed any light on what is happening here?  I really have no clue on 
how to proceed at this point.

As far as I can tell, the test driver is an accurate model of the 'real' 
program -- it is threaded, it has the same class hierarchy, it includes the 
same libraries, it just doesn't have all the pre- and post-processing in it.  
They are both including the same 'bsn_a.rb'.

11:52 (kant)$ ruby -v
ruby 1.8.2 (2004-12-25) [i386-freebsd5.3]

Regards,

-- 
-mark.  (probertm at acm dot org)