On Sat, 26 Aug 2006, Zed Shaw wrote: > Howdy Folks, > > This release is after painstaking analysis of a memory leak that was > reported by Bradley Taylor, reduced by myself, and then fixed after much > work. You should all thank Bradley for finding the bizarre fix. > > It turns out the Ruby has a memory leak when you use pretty much any > thread locking primitive other than Sync (Mutex, Monitor, etc.): > > http://pastie.caboo.se/10194 > > The fix (for whatever reason) is to use Sync and put it in a block: > > http://pastie.caboo.se/10317 > > Those two scripts are mini versions of how Mongrel manages threads so > that I could figure out a solution or get some input. The graph is > reported ram usage samples 1/second. As you can see the first leaking > graph goes up and doesn't go down, the second (fixed) graph cycles > properly. > > ** This is a Ruby issue, so if you have software using Mutex or Monitor, > change to Sync now. ** > > Tests of this latest pre-release show that the RAM is properly cycled by > the GC and that it's actually finally solved. If you run your app using > this release and you still have a leak then use the memory debugging > tools mongrel has to rule out your code (see below). hi zed- if you are really serious about fixing your leak i suggest you re-work your tests. as i mentioned before they have several race conditions, not least of which that they both start a random number of threads, not 1000 as the code suggests (you can easily confirm by printing out the number of times the thread init loop executes). further, sync.rb is the single ruby lib i've had memory issues with on production systems. i have never managed to figure out why that is... in any case a careful script which allocated memory in a thread, waits for all threads to finish allocation, checks memory, and then kills all threads before checking again shows some suprising results which you should read carefully: using mutex shows a nice cycle of memory freed: harp:~ > cat a.rb.mutex using: Mutex n: 420 iter: 0 with 420 threads holding memory : 44.0% with 0 threads holding memory : 13.0% iter: 1 with 420 threads holding memory : 43.9% with 0 threads holding memory : 13.0% iter: 2 with 420 threads holding memory : 44.1% with 0 threads holding memory : 13.3% iter: 3 with 420 threads holding memory : 44.1% with 0 threads holding memory : 13.2% iter: 4 with 420 threads holding memory : 44.0% with 0 threads holding memory : 13.5% iter: 5 with 420 threads holding memory : 44.1% with 0 threads holding memory : 13.2% iter: 6 with 420 threads holding memory : 43.9% with 0 threads holding memory : 13.2% iter: 7 with 420 threads holding memory : 44.2% with 0 threads holding memory : 13.2% iter: 8 with 420 threads holding memory : 44.1% with 0 threads holding memory : 13.5% iter: 9 with 420 threads holding memory : 44.1% with 0 threads holding memory : 13.9% using sync, on the other hand, looks leaky, though i'm not saying it is. harp:~ > cat a.rb.sync using: Sync n: 420 iter: 0 with 420 threads holding memory : 43.8% with 0 threads holding memory : 1.0% iter: 1 with 420 threads holding memory : 43.8% with 0 threads holding memory : 2.0% iter: 2 with 420 threads holding memory : 43.8% with 0 threads holding memory : 2.7% iter: 3 with 420 threads holding memory : 43.8% with 0 threads holding memory : 3.5% iter: 4 with 420 threads holding memory : 43.8% with 0 threads holding memory : 3.8% iter: 5 with 420 threads holding memory : 43.8% with 0 threads holding memory : 4.6% iter: 6 with 420 threads holding memory : 43.8% with 0 threads holding memory : 5.4% iter: 7 with 420 threads holding memory : 43.8% with 0 threads holding memory : 6.4% iter: 8 with 420 threads holding memory : 43.8% with 0 threads holding memory : 7.2% iter: 9 with 420 threads holding memory : 43.7% with 0 threads holding memory : 8.1% here is the code, note that it's quite careful to only create a fixed number of threads, to wait for them to each init a mb of memory, and only then to check memory usage. likewise for checking after killing all threads - it's done immediately after killing threads and running gc. here is the code: harp:~ > cat a.rb require 'thread' require 'sync' class TestThreads def initialize which, n c = case which when /mutex/io Mutex when /sync/io Sync end @guard = c.new @n = Integer n puts "using: #{ c.name }" puts "n: #{ @n }" end def pct_mem # linux specific field pos i'm sure stdout = `ps v #{ Process.pid }` stdout.split(%r/\n/).last.strip.split(%r/\s+/)[8] + '%' end def tq q = Queue.new t = Thread.new{ mb = @guard.synchronize{ 0.chr * (2 ** 20) } q.push :ready Thread.stop } [t, q] end def run list = [] 10.times do |i| puts "iter: #{ i }" # load 1000 threads up @n.times{ list << tq } # wait for all threads to init memory with mb of data list.each{|t,q| q.pop} # show memory usage GC.start puts "with #{ list.size } threads holding memory : #{ pct_mem }" # kill all threads - clean up list.each{|t,q| t.kill} list.clear sleep 1 until Thread.list.size == 1 # show memory usage GC.start puts "with 0 threads holding memory : #{ pct_mem }" end end end $VERBOSE = nil STDOUT.sync = true Thread.abort_on_exception = true trap('INT'){ exit } which, n, ignored = ARGV TestThreads.new(which, n).run in any case, i'd carefully examine your tests (or the rails code if that is indeed what it's modeled after) to make sure that they test Mutex/Sync/Thread/Ruby and not your os virtual memory system and look closely at the results again - like i said, i have had issues with sync.rb. the point here is that it is probably the code in question and not Mutex per se that was causing your process to grow in vmsize. regards. -a -- to foster inner awareness, introspection, and reasoning is more efficient than meditation and prayer. - h.h. the 14th dalai lama