Dear Mike,
why don't you first create an Array of files that you want
to delete, and do the deletion after that ?
You can start the garbage collection by force also ... maybe
like this:
class String
def compare_dir(other_dir)
list1 = Dir.entries(self)
list2=Dir.entries(other_dir)
in_first_but_not_in_second_dir=list1-list2
# remove directories from the list
in_first_but_not_in_second_dir.delete_if{|x| FileTest::directory?(x)}
return in_first_but_not_in_second_dir
end
end
class Array
def delete_files
self.each{|x|
File.delete(x)
GC.start
}
end
end
I compared just the names of roughly 3500 files in two
directories in a fraction of a second like this,
but I haven't tried to erase them...
Best regards,
Axel
t=Time.now
p "/home/axel".compare_dir("/home/axel/ruby").length
p Time.now-t
-------- Original-Nachricht --------
Datum: Tue, 26 Jun 2007 11:35:54 +0900
Von: "Mike Steiner" <mikejaysteiner / gmail.com>
An: ruby-talk / ruby-lang.org
Betreff: optimizing Hash deletions
> I have a recursive file-compare program that I run on 2 directories that
> each contain about 10,000 files, 99% of which don't change. I read all the
> filenames into hashes and then (first) delete all the matches, using this
> code:
>
> def RemoveIdenticalDirs ( old_fe_list , new_fe_list )
> # this function is necessary because we have to do a case-insensitive
> match for directories, but not for files
> $stderr.puts "Detecting identical dirs..."
> old_fe_list.keys.each do | oldfile |
> new_fe_list.keys.each do | newfile | # can't use .has_key?
> because
> we have to do a case-insensitive match
> if old_fe_list[oldfile].is_dir and new_fe_list[newfile].is_dir
> and oldfile.downcase == newfile.downcase
> old_fe_list.delete ( oldfile )
> new_fe_list.delete ( newfile )
> break
> end
> end
> end
> end
>
> def RemoveIdenticalFiles ( old_fe_list , new_fe_list , comparetype )
> $stderr.puts "Detecting identical files..."
> old_fe_list.keys.each do | file |
> if new_fe_list.has_key? ( file )
> if !(old_fe_list[file].is_dir) and !(new_fe_list[file].is_dir)
> and FilesAreIdentical ( comparetype , old_fe_list[file] ,
> new_fe_list[file]
> )
> old_fe_list.delete ( file )
> new_fe_list.delete ( file )
> end
> end
> end
> end
>
> Note that I've added an attribute to each hash called .is_dir which just
> holds a Boolean value.
>
> When I run the program (under WinXP using Ruby 185-21, if it matters), it
> takes about 5-10 seconds to execute the above 2 functions. But that's not
> the worst part - it chews up memory big time. The machine I'm running it
> on
> has 768MB of RAM (with no swap file), and Windows gives me a warning that
> it's out of memory as the above code runs. However, neither Windows nor
> the
> program crashes. I'm guessing that all the .delete's are causing lots of
> memory usage and the garbage collector starts running.
>
> So my question is - is there a less memory-intensive way of doing the
> above?
> Would setting elements to nil instead of deleting them make any
> difference?
> Or maybe copying entries to a new hash somehow?
>
> Mike Steiner
--
GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
Alle Infos und kostenlose Anmeldung: http://www.gmx.net/de/go/freemail