I'm trying to use the PTY extension to run ssh to communicate with a program
on a foreign host.
I tried using the block form:
begin
PTY.spawn("ssh ...") { |r,w,p| ... }
rescue RuntimeError
# the ssh died unexpectedly
end
If I do this, and ssh does not die unexpectedly within the block, I can only
run this 16 times, after which I get an error from the PTY extension that too
many ptys are open.
I traced this to the fact that, if the block form is used, the signal handler
is reset after yielding, and the chld_changed signal handler is never called
to clean up the child_pid table. I believe this is a bug, but I don't have
time at the moment to try and fix it. (I'm using the PTY extension that comes
with ruby-1.6.3)
Of course, if ssh dies within the block, the signal handler is called, the
child_pid table is cleaned up, and the exception is raised. This allows me to
run the code above more than 16 times.
So, I converted to use the non-block form:
begin
r,w,p = PTY.spawn("ssh ...")
...
sleep 5 # wait for the process to die
rescue RuntimeError
# ignore it because the ssh died
end
Doing it this way allows me to run this more than sixteen times.
The problem is, if something odd happens, I just want to close the connection
and continue on. Therefore the code became something like this:
p = nil
begin
begin
r,w,p = PTY.spawn("ssh ...")
...
<if there's a problem, raise MyError>
...
rescue MyError
...
ensure
if p
sleep 5 # wait for it to die by itself
Process.kill('KILL', p)
sleep 5 # wait for the kill to take effect
end
end
rescue RuntimeError
# ignore it because ssh died
end
Before anyone asks, I tried Process.waitpid(p) instead of the second "sleep
5", but when I did that, the signal handler never gets called to clean up the
child table and I have the 16 run problem again (except this time, it's 16
runs where MyError gets raised).
Anyhow, is there any better way of doing this? The sleep thing seems like
something of a kludge to get this to work.
On the other hand, the SIGKILL should always succeed (kill -9) so maybe I'm
just being overly worried. If this is the case, does anyone think 5 seconds
for the second "sleep 5" is not enough? I want to be sure that I catch the
RuntimeError raised by the chld_changed signal handler in this section of the
code so as not to disrupt the rest of my program.
Any help or advice would be appreciated.
Thanks,
Henry.