I am doing some large queries with Mysql and the memory that gets allocated never seems to go back 
to the system. In this test I am querying 47,000 records.

My setup for my test is ubuntu breezy, rails 1.0, mysql 5.0.18 and a compiled mysql 2.7 ruby driver 
running in production mode. I have wrote a helper to my program which monitors object count, memory 
utilization and a threshold of 10% to determine how many objects are sticking around and how many 
are being garbage collected.

On first run...

   Loaded suite test/unit/table1_test
   Started
   String count                             73685
   Building query                           21Mb
   String count                             73745

   Received query results                   89Mb
   String count                             901034
   Threshold breaker String                 (827236) started w/ 73743 ended w/ 900979
   Threshold breaker Table1                 47000 started w/ 0 ended w/ 47000

   Starting GC                              89Mb
   String count                             25041
   Done with GC                             82Mb
   Threshold breaker String                 -48704 started w/ 73743 ended w/ 25039

The first block of text shows that when I started my test ruby was utilizing 21Mb of memory and over 
70,000 strings were in existance. After the query results were constructed there was over 900,000 
strings in existance and my ruby process had grown to 89Mb. The threshold is shown as a gain in 
47,000 Table1 objects and 827,236 Strings since the the first object count was captured (which 
occured right before the original String count).

After garbage collecting the Stringcount is down 48,704 from when it was first captured, and there 
is no threshold break for Table1 because when the program first started there were 0 in existance. 
Thus meaning that 0 Table1 objects are in existance. However the memory size never seems to leave 
82Mb. It never seems to leave that size.

If I query again, memory goes up to 133Mb, after all Strings and ActiveRecord models have been 
garbage collected, memory goes down ever so slightly again. And this continues as long as I keep 
querying.

The test is broken out into three methods:
  - test_build_mem_usage (count objects, perform query, store results in local variable )
  - test_gc (count objects, GC.start)
  - test_z (done, recount objects since GC is done)

I guess my biggest unknown at the moment is...as I do large queries is ruby just hanging onto that 
space? Why would it keep growing for the next time I did 47,000 item query, if it already had unused 
space available from my last query?

At the bottom of this post is the actual test schema and test code I was using.

Zach

---- start schema ----
create table table1 (
   id int unsigned not null auto_increment,
   description varchar(255),
   store_name varchar(255),
   address1 varchar(40),
   address2 varchar(40),
   city varchar(40),
   state varchar(15),
   zip_code varchar(5),
   primary key( id )
)TYPE=MyISAM;


---- start test code ----
# hook into the Rails environment
require File.dirname(__FILE__) + '/../test_helper'

require 'table1'

class Object
   def count_objects
     objects = Hash.new{ |h,k| h[k]=0 }
     ObjectSpace.each_object{ |obj| objects[obj.class] += 1 }
     objects
   end

   def print_threshold_breakers hsh1, hsh2, threshold
     # threshold is in percentages
     threshold = 1.0 + 1.0 / threshold

     hsh2.each_key do |key|
       max_num = hsh1[key] * threshold
       min_num = hsh1[key] / threshold

       if hsh2[key] > max_num or hsh2[key] < min_num
         putsf "Threshold breaker #{key.to_s}", "(#{hsh2[key]-hsh1[key]}) started w/ #{hsh1[key]} 
ended w/ #{hsh2[key]}"
       end
     end
   end

   def count_objects_for clazz
     c = 0
     ObjectSpace.each_object{ |o| c+=1 if o.is_a? clazz }
     c
   end

   def mem_usage
     # get the top two lines from top
     line_arr = `ps -p #{Process.pid} -F`.split( /\n/ )

     # split the line array into columns of headers and data
     arr1, arr2 = line_arr.map{ |line| line.split( /\s+/ ) }

     # force the same number of elements in arr2 as there are in arr1 by joining any leftover 
elements
     column_arr = [ arr1 ]
     column_arr << arr2[0 .. arr1.size-2] + arr2[arr1.size-1 .. arr2.size-1].join( ' ' ).to_a

     # get column/data key pair array
     keypair_arr = column_arr.transpose

     # create hash
     hsh = {}
     keypair_arr.each{ |e| hsh[e[0]] = e[1] }

     # grab results from RSS, which are stored in Kb
     (hsh[ 'RSS' ].to_i / 1024.0).round.to_s << "Mb"
   end

   def putsf label, *args
     printf( "%-40.40s %-40s\n", label.to_s, args.join( ' ' ) )
   end

   def print_class_count clazz
     putsf "#{clazz.name} count", count_objects_for( clazz )
   end


end


  class TableTest < Test::Unit::TestCase

    def test_build_mem_usage
      print_class_count String
      putsf 'Building mem usage', mem_usage
      h1 = @@h1 = count_objects
      print_class_count String

     records = Table1.find :all, :limit=>47000
     @@oid = records.object_id
     h2 = count_objects

     putsf 'Done building mem usage', mem_usage
     print_class_count String

     print_threshold_breakers h1, h2, 10
     sleep 2
     puts
   end

   def test_starting_gc
     putsf 'Starting GC', mem_usage

     h1 = count_objects
     GC.start

     h2 = @@h2 = count_objects

     print_class_count String
     putsf 'Done with GC', mem_usage
     print_threshold_breakers h1, h2, 10
     puts
   end

   def test_z
     test_build_mem_usage
     test_starting_gc

       test_build_mem_usage
     test_starting_gc

       test_build_mem_usage
     test_starting_gc

     print_class_count String
     print_class_count Table1
     putsf 'Done', mem_usage
     print_threshold_breakers @@h1, @@h2, 10
     puts

     ObjectSpace.each_object{ |obj| puts "FOUND THE RECORD ARRAY " if obj.object_id == @@oid }
#    ObjectSpace.each_object { |obj| puts obj if obj.is_a? String }
   end


end