Hi ..

Has anyone seen anything like this?

load averages:  0.63,  0.30,  0.16                                         
02:07:51
134 processes: 118 sleeping, 14 zombie, 2 on cpu
CPU states: 77.0% idle, 16.8% user,  5.6% kernel,  0.6% iowait,  0.0% swap
Memory: 4096M real, 2984M free, 2952M swap in use, 2357M swap free

  PID USERNAME THR PRI NICE  SIZE   RES STATE   TIME    CPU COMMAND
 6683 root       1   0    4 1744M   94M cpu0    1:18 16.93% healthcollect.r
28954 bwczkdj    1  58    0 2608K 1792K sleep  10:55  0.49% top
 ...

The 13 mins later:

load averages:  0.57,  0.65,  0.48                                         
02:20:11
139 processes: 124 sleeping, 14 zombie, 1 on cpu
CPU states: 93.9% idle,  3.7% user,  2.4% kernel,  0.0% iowait,  0.0% swap
Memory: 4096M real, 2678M free, 5294M swap in use, 15M swap free

  PID USERNAME THR PRI NICE  SIZE   RES STATE   TIME    CPU COMMAND
 6683 root       1  20    4 4082M  291M sleep   9:15 12.43% healthcollect.r
 ...

Notice that the reported memory size of the Ruby application is growing?  All 
the swap gets consumed and the process dies.  Looks like a memory leak 
somewhere, though I could well be mistaken.  Certainly, the process reporting 
that it is using 4Gb of memory raise my eye brows.

As background, this script runs data collection against about 220 network 
elements.  The script is multithreaded, with the current thread count at 15.  
Each thread will have a dedicated telnet session to the network element.  The 
data collected is sent straight to disk, then post-processed. 

I don't have access to upgrading the hardware, the swap size at all, or the 
ruby version easily.  

$ ruby -v
ruby 1.8.0 (2003-08-04) [sparc-solaris2.8]

Regards,

-- 
-mark.  (probertm at acm dot org)