I'm trying to use the PTY extension to run ssh to communicate with a program 
on a foreign host.

I tried using the block form:

begin
  PTY.spawn("ssh ...") { |r,w,p| ... }
rescue RuntimeError
  # the ssh died unexpectedly
end

If I do this, and ssh does not die unexpectedly within the block, I can only 
run this 16 times, after which I get an error from the PTY extension that too 
many ptys are open.

I traced this to the fact that, if the block form is used, the signal handler 
is reset after yielding, and the chld_changed signal handler is never called 
to clean up the child_pid table.  I believe this is a bug, but I don't have 
time at the moment to try and fix it.  (I'm using the PTY extension that comes 
with ruby-1.6.3)

Of course, if ssh dies within the block, the signal handler is called, the 
child_pid table is cleaned up, and the exception is raised.  This allows me to 
run the code above more than 16 times.

So, I converted to use the non-block form:

begin
  r,w,p = PTY.spawn("ssh ...")
  ...
  sleep 5 # wait for the process to die
rescue RuntimeError
  # ignore it because the ssh died
end

Doing it this way allows me to run this more than sixteen times.

The problem is, if something odd happens, I just want to close the connection 
and continue on.  Therefore the code became something like this:

p = nil
begin
  begin
    r,w,p = PTY.spawn("ssh ...")
    ...
    <if there's a problem, raise MyError>
    ...
  rescue MyError
    ...
  ensure
    if p
      sleep 5 # wait for it to die by itself
      Process.kill('KILL', p)
      sleep 5 # wait for the kill to take effect
    end
  end
rescue RuntimeError
  # ignore it because ssh died
end

Before anyone asks, I tried Process.waitpid(p) instead of the second "sleep 
5", but when I did that, the signal handler never gets called to clean up the 
child table and I have the 16 run problem again (except this time, it's 16 
runs where MyError gets raised).

Anyhow, is there any better way of doing this?  The sleep thing seems like 
something of a kludge to get this to work.

On the other hand, the SIGKILL should always succeed (kill -9) so maybe I'm 
just being overly worried.  If this is the case, does anyone think 5 seconds 
for the second "sleep 5" is not enough?  I want to be sure that I catch the 
RuntimeError raised by the chld_changed signal handler in this section of the 
code so as not to disrupt the rest of my program.

Any help or advice would be appreciated.

Thanks,
Henry.