Dear Mike,

why don't you first create an Array of files that you want
to delete, and do the deletion after that ?
You can start the garbage collection by force also ... maybe
like this:

class String
	def compare_dir(other_dir)
		list1 = Dir.entries(self)
		list2=Dir.entries(other_dir)
		in_first_but_not_in_second_dir=list1-list2
		# remove directories from the list
		in_first_but_not_in_second_dir.delete_if{|x| FileTest::directory?(x)}
		return in_first_but_not_in_second_dir
	end
end
class Array		
	def delete_files
		self.each{|x|
			File.delete(x)
			GC.start
		}
	end
end

I compared just the names of roughly 3500 files in two
directories in a fraction of a second like this,
but I haven't tried to erase  them...
Best regards,

Axel

t=Time.now
p "/home/axel".compare_dir("/home/axel/ruby").length
p Time.now-t


-------- Original-Nachricht --------
Datum: Tue, 26 Jun 2007 11:35:54 +0900
Von: "Mike Steiner" <mikejaysteiner / gmail.com>
An: ruby-talk / ruby-lang.org
Betreff: optimizing Hash deletions

> I have a recursive file-compare program that I run on 2 directories that
> each contain about 10,000 files, 99% of which don't change. I read all the
> filenames into hashes and then (first) delete all the matches, using this
> code:
> 
> def RemoveIdenticalDirs ( old_fe_list , new_fe_list )
>     # this function is necessary because we have to do a case-insensitive
> match for directories, but not for files
>     $stderr.puts "Detecting identical dirs..."
>     old_fe_list.keys.each do | oldfile |
>         new_fe_list.keys.each do | newfile |   # can't use .has_key?
> because
> we have to do a case-insensitive match
>             if old_fe_list[oldfile].is_dir and new_fe_list[newfile].is_dir
> and oldfile.downcase == newfile.downcase
>                 old_fe_list.delete ( oldfile )
>                 new_fe_list.delete ( newfile )
>                 break
>             end
>         end
>     end
> end
> 
> def RemoveIdenticalFiles ( old_fe_list , new_fe_list , comparetype )
>     $stderr.puts "Detecting identical files..."
>     old_fe_list.keys.each do | file |
>         if new_fe_list.has_key? ( file )
>             if !(old_fe_list[file].is_dir) and !(new_fe_list[file].is_dir)
> and FilesAreIdentical ( comparetype , old_fe_list[file] ,
> new_fe_list[file]
> )
>                 old_fe_list.delete ( file )
>                 new_fe_list.delete ( file )
>             end
>         end
>     end
> end
> 
> Note that I've added an attribute to each hash called .is_dir which just
> holds a Boolean value.
> 
> When I run the program (under WinXP using Ruby 185-21, if it matters), it
> takes about 5-10 seconds to execute the above 2 functions. But that's not
> the worst part - it chews up memory big time. The machine I'm running it
> on
> has 768MB of RAM (with no swap file), and Windows gives me a warning that
> it's out of memory as the above code runs. However, neither Windows nor
> the
> program crashes. I'm guessing that all the .delete's are causing lots of
> memory usage and the garbage collector starts running.
> 
> So my question is - is there a less memory-intensive way of doing the
> above?
> Would setting elements to nil instead of deleting them make any
> difference?
> Or maybe copying entries to a new hash somehow?
> 
> Mike Steiner

-- 
GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail