------art_301_1351220.1182825356089
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

I have a recursive file-compare program that I run on 2 directories that
each contain about 10,000 files, 99% of which don't change. I read all the
filenames into hashes and then (first) delete all the matches, using this
code:

def RemoveIdenticalDirs ( old_fe_list , new_fe_list )
    # this function is necessary because we have to do a case-insensitive
match for directories, but not for files
    $stderr.puts "Detecting identical dirs..."
    old_fe_list.keys.each do | oldfile |
        new_fe_list.keys.each do | newfile |   # can't use .has_key? because
we have to do a case-insensitive match
            if old_fe_list[oldfile].is_dir and new_fe_list[newfile].is_dir
and oldfile.downcase newfile.downcase
                old_fe_list.delete ( oldfile )
                new_fe_list.delete ( newfile )
                break
            end
        end
    end
end

def RemoveIdenticalFiles ( old_fe_list , new_fe_list , comparetype )
    $stderr.puts "Detecting identical files..."
    old_fe_list.keys.each do | file |
        if new_fe_list.has_key? ( file )
            if !(old_fe_list[file].is_dir) and !(new_fe_list[file].is_dir)
and FilesAreIdentical ( comparetype , old_fe_list[file] , new_fe_list[file]
)
                old_fe_list.delete ( file )
                new_fe_list.delete ( file )
            end
        end
    end
end

Note that I've added an attribute to each hash called .is_dir which just
holds a Boolean value.

When I run the program (under WinXP using Ruby 185-21, if it matters), it
takes about 5-10 seconds to execute the above 2 functions. But that's not
the worst part - it chews up memory big time. The machine I'm running it on
has 768MB of RAM (with no swap file), and Windows gives me a warning that
it's out of memory as the above code runs. However, neither Windows nor the
program crashes. I'm guessing that all the .delete's are causing lots of
memory usage and the garbage collector starts running.

So my question is - is there a less memory-intensive way of doing the above?
Would setting elements to nil instead of deleting them make any difference?
Or maybe copying entries to a new hash somehow?

Mike Steiner

------art_301_1351220.1182825356089--