I am using Mechanize for several projects that require me to download large 
amounts of html pages from a web site. Since I am working with about a 1000 
pages the limitations of mechanize started to appear...

Try this code 

################################################
require 'rubygems'
require 'mechanize'

agent = WWW::Mechanize.new

prev = 0
curr = 0
prev_pages = 0
curr_pages = 0

1000.times do 
	page = agent.get("http://yourfavoritepage.com")
	curr = 0
	curr_pages = 0
	# Count the total number of  objects  and the number of WWW::Mechanize::Page 
	# objects.
	ObjectSpace.each_object { |o|
		curr += 1
		curr_pages += 1 if o.class == WWW::Mechanize::Page
	}
	puts "There are #{curr} (#{curr - prev}) objects"
	puts "There are #{curr_pages} (#{curr_pages - prev_pages}) objects"
	prev = curr
	prev_pages = curr_pages
	GC.enable
	GC.start
	sleep 1.0		# This avoids the script of taking 100% CPU
end

############################################

The output of this script repeals that at each iteration a 
WWW::Mechanize::Page object gets created (along with a lot of other objects) 
and they never get GarbageCollected. So you can see your RAM flying away in 
each iteration and never returning back. 

Now this can be solved by putting the agent = WWW::Mechanize.new inside the 
block like:

############################################

1000.times do 
	agent = WWW::Mechanize.new			<-- CHANGE IS HERE
	page = agent.get("http://yourfavoritepage.com")
	curr = 0
	curr_pages = 0
	# Count the total number of  objects  and the number of WWW::Mechanize::Page 

	..... the rest is the same
#############################################


With this change we see that the max number of WWW::Mechanize::Page objects 
never increases more then three and the other objects increase and decrease 
in the order of 60 per iteration.


Does this means that the WWW::Mechanize object keeps references of all the 
pages downloaded?? and those pages are not gonna be GarbageCollected until 
the WWW::Mechanize object is alive?

In my script I cannot remove the WWW::Mechanize object since this page in 
particular is a form and requires cookies state information to be able to 
access to the pages I need to download. Is there a way to tell the Mechanize 
Object to delete the pages alreade downloaded??

regards,
Horacio