On Sat, 26 Aug 2006, Zed Shaw wrote:

> Howdy Folks,
>
> This release is after painstaking analysis of a memory leak that was
> reported by Bradley Taylor, reduced by myself, and then fixed after much
> work.  You should all thank Bradley for finding the bizarre fix.
>
> It turns out the Ruby has a memory leak when you use pretty much any
> thread locking primitive other than Sync (Mutex, Monitor, etc.):
>
> http://pastie.caboo.se/10194
>
> The fix (for whatever reason) is to use Sync and put it in a block:
>
> http://pastie.caboo.se/10317
>
> Those two scripts are mini versions of how Mongrel manages threads so
> that I could figure out a solution or get some input.  The graph is
> reported ram usage samples 1/second.  As you can see the first leaking
> graph goes up and doesn't go down, the second (fixed) graph cycles
> properly.
>
> ** This is a Ruby issue, so if you have software using Mutex or Monitor,
> change to Sync now. **
>
> Tests of this latest pre-release show that the RAM is properly cycled by
> the GC and that it's actually finally solved.  If you run your app using
> this release and you still have a leak then use the memory debugging
> tools mongrel has to rule out your code (see below).

hi zed-

if you are really serious about fixing your leak i suggest you re-work your
tests.  as i mentioned before they have several race conditions, not least of
which that they both start a random number of threads, not 1000 as the code
suggests (you can easily confirm by printing out the number of times the
thread init loop executes).  further, sync.rb is the single ruby lib i've had
memory issues with on production systems.  i have never managed to figure out
why that is...

in any case a careful script which allocated memory in a thread, waits for all
threads to finish allocation, checks memory, and then kills all threads before
checking again shows some suprising results which you should read carefully:


using mutex shows a nice cycle of memory freed:

     harp:~ > cat a.rb.mutex
     using: Mutex
     n: 420
     iter: 0
     with 420 threads holding memory : 44.0%
     with 0 threads holding memory : 13.0%
     iter: 1
     with 420 threads holding memory : 43.9%
     with 0 threads holding memory : 13.0%
     iter: 2
     with 420 threads holding memory : 44.1%
     with 0 threads holding memory : 13.3%
     iter: 3
     with 420 threads holding memory : 44.1%
     with 0 threads holding memory : 13.2%
     iter: 4
     with 420 threads holding memory : 44.0%
     with 0 threads holding memory : 13.5%
     iter: 5
     with 420 threads holding memory : 44.1%
     with 0 threads holding memory : 13.2%
     iter: 6
     with 420 threads holding memory : 43.9%
     with 0 threads holding memory : 13.2%
     iter: 7
     with 420 threads holding memory : 44.2%
     with 0 threads holding memory : 13.2%
     iter: 8
     with 420 threads holding memory : 44.1%
     with 0 threads holding memory : 13.5%
     iter: 9
     with 420 threads holding memory : 44.1%
     with 0 threads holding memory : 13.9%

using sync, on the other hand, looks leaky, though i'm not saying it is.

     harp:~ > cat a.rb.sync
     using: Sync
     n: 420
     iter: 0
     with 420 threads holding memory : 43.8%
     with 0 threads holding memory : 1.0%
     iter: 1
     with 420 threads holding memory : 43.8%
     with 0 threads holding memory : 2.0%
     iter: 2
     with 420 threads holding memory : 43.8%
     with 0 threads holding memory : 2.7%
     iter: 3
     with 420 threads holding memory : 43.8%
     with 0 threads holding memory : 3.5%
     iter: 4
     with 420 threads holding memory : 43.8%
     with 0 threads holding memory : 3.8%
     iter: 5
     with 420 threads holding memory : 43.8%
     with 0 threads holding memory : 4.6%
     iter: 6
     with 420 threads holding memory : 43.8%
     with 0 threads holding memory : 5.4%
     iter: 7
     with 420 threads holding memory : 43.8%
     with 0 threads holding memory : 6.4%
     iter: 8
     with 420 threads holding memory : 43.8%
     with 0 threads holding memory : 7.2%
     iter: 9
     with 420 threads holding memory : 43.7%
     with 0 threads holding memory : 8.1%

here is the code, note that it's quite careful to only create a fixed number
of threads, to wait for them to each init a mb of memory, and only then to
check memory usage.  likewise for checking after killing all threads - it's
done immediately after killing threads and running gc.  here is the code:

     harp:~ > cat a.rb
     require 'thread'
     require 'sync'

     class TestThreads
       def initialize which, n
         c = case which
           when /mutex/io
             Mutex
           when /sync/io
             Sync
         end
         @guard = c.new
         @n = Integer n
         puts "using: #{ c.name }"
         puts "n: #{ @n }"
       end

       def pct_mem # linux specific field pos i'm sure
         stdout = `ps v #{ Process.pid }`
         stdout.split(%r/\n/).last.strip.split(%r/\s+/)[8] + '%'
       end

       def tq
         q = Queue.new
         t = Thread.new{
           mb = @guard.synchronize{ 0.chr * (2 ** 20) }
           q.push :ready
           Thread.stop
         }
         [t, q]
       end

       def run
         list = []
         10.times do |i|
           puts "iter: #{ i }"

           # load 1000 threads up
           @n.times{ list << tq }

           # wait for all threads to init memory with mb of data
           list.each{|t,q| q.pop}

           # show memory usage
           GC.start
           puts "with #{ list.size } threads holding memory : #{ pct_mem }"

           # kill all threads - clean up
           list.each{|t,q| t.kill}
           list.clear
           sleep 1 until Thread.list.size == 1

           # show memory usage
           GC.start
           puts "with 0 threads holding memory : #{ pct_mem }"
         end
       end
     end

     $VERBOSE = nil
     STDOUT.sync = true
     Thread.abort_on_exception = true
     trap('INT'){ exit }

     which, n, ignored = ARGV
     TestThreads.new(which, n).run


in any case, i'd carefully examine your tests (or the rails code if that is
indeed what it's modeled after) to make sure that they test
Mutex/Sync/Thread/Ruby and not your os virtual memory system and look closely
at the results again - like i said, i have had issues with sync.rb.

the point here is that it is probably the code in question and not Mutex per
se that was causing your process to grow in vmsize.

regards.

-a
-- 
to foster inner awareness, introspection, and reasoning is more efficient than
meditation and prayer.
- h.h. the 14th dalai lama