I have been posting to the ruby-talk mailing list about ruby memory and GC, and I think it's ready 
for ruby-core mailing list. The thread topic was "Memory Question", www.ruby-lang.org appears to be 
  down at the moment otherwise I'd post a links to the mailing list archives.

Here is the latest post from me, afterwards is some input from H. Yamamoto:

Well I did some digging, and I'm left with more questions. The following script does NOT collect the 
800,000 strings created (this doesn't use Mysql, Rails or anything else, just ruby core). I ran this 
in accordance with a couple other scripts which utilize 150mb+ of memory, to see if the os would 
somehow force ruby to do better GC'ing, it didn't and swap was used. On my system this script 
consistently used 42Mb of memory, and the string count never went below 800,000. Why would the 
strings not get gc'd when they are out of scope ?

----------test1.rb----------------
def count_objects_for clazz
   c = 0
   ObjectSpace.each_object{ |o| c+=1 if o.is_a? clazz }
   c
end

class A
  def run
    arr = []
    800000.times { arr << "d" * 7 }

    puts "check now"
    sleep 10
   end
end

puts count_objects_for( String )
A.new.run
puts count_objects_for( String )
GC.start
puts count_objects_for( String )

Thread.new do
   loop do
     puts count_objects_for( String )
     GC.start
     sleep 5
   end
end.join
-------end test1.rb-------------


Send test, same script as above but slight modification to call Array#clear. Now this actually 
collects the Strings. I am specifically removing reference in the below code by clearing the Array 
arr. But in the above code I don't reference arr or it's value anywhere outside of the method it is 
in, so after that call occurs shouldn't it be fair game for the garbage collector?

-------- test2.rb-----------
def count_objects_for clazz
   c = 0
   ObjectSpace.each_object{ |o| c+=1 if o.is_a? clazz }
   c
end

class A
  def run
    arr = []
    800000.times { arr << "d" * 7 }
    puts "check now"
    sleep 10
    arr.clear
   end
end

puts count_objects_for( String )
A.new.run
puts count_objects_for( String )
GC.start
puts count_objects_for( String )

Thread.new do
   loop do
     puts count_objects_for( String )
     GC.start
     sleep 5
   end
end.join
-------end test1.rb-------------

I modified this second script to run with 8 million string also. When the Strings are GC'd the ruby 
processes memory usage goes down, but not as much as I'd think. With 8 million strings I am getting 
330+Mb of memory in use, but when I GC them, memory only seems to go 191Mb, where I would think it 
would fall back down in the single/double digit's, perhaps 15M or 20M. Any reason why ruby doesn't 
let go? Is this a problem? Right now I have ran this with ruby 1.8.3 (2005-06-23) [i486-linux], and 
ruby 1.8.4 (2005-12-24) [powerpc-linux] with similar results.



Here is H. Yamamoto's response:




Hello.

Ruby's GC is conservative, so there is no guarantee object is freed even if
object is not reachable from GC's root.

But anyway, probably I found the problem on ELTS_SHARED.

/////////////////////////////////

def pause
   GC.start
   $stdout.puts "measure memory and hit any key..."
   $stdout.flush
   $stdin.getc
end

pause

a = Array.new(10000){ "." * 1000 } # huge memory

pause

a.map!{|s| s[-100..-1]} # memory stays large

pause

a.map!{|s| s[-3..-1]} # reduces memory

pause


/////////////////////////////////

This is because rb_str_substr (string.c) 's

     else if (len > sizeof(struct RString)/2 &&
	beg + len == RSTRING(str)->len && !FL_TEST(str, STR_ASSOC)) {
	str2 = rb_str_new3(rb_str_new4(str));
	RSTRING(str2)->ptr += RSTRING(str2)->len - len;
	RSTRING(str2)->len = len;
     }

is executed at

   a.map!{|s| s[-100..-1]} # memory stays large

rb_str_new3 generates ELS_SHARED RString which holds original RString.

When original string becomes unreachable, it should be garbage collected.
But ELTS_SHARED substring references it (RString#aux->shared), so not collected
until substring itself becomes unreachable.

I haven't confirmed this is really cause of your problem, but there is possibility
this hidden huge string eats memory. (maybe same thing happens on Array)
---- end response ----

Could anyone enlighten me here, or confirm as H. Yamamoto suspects, that there is a problem?

Thanks,

Zach