On Mon, Mar 7, 2011 at 11:09 AM, Xavier Noria <fxn / hashref.com> wrote:
> On Mon, Mar 7, 2011 at 11:48 AM, New C. <coding25 / yahoo.com> wrote:
>
>> I have a got a few folders which may have same files under different
>> names. Is there any way I can find which these files are using ruby ?
>> ...
>> I wondering if there is some sort of diff module that can do this.
>
> There's File.compare.
>
> Depending on how many comparisons you're going to do, it might be a
> good idea to precompute checksums and compare the checksums.

I'm interested in any (Ruby) solutions (actual or ideas) for this, as
I have needed to do it in the past, and want to do something similar
in the very near future.

For comparing directories where the file names might have changed what
I've done in the past is to first match on file name, then for the
unmatching files in each directory see if there are any matches on
file size, and for those matches either make a direct File.compare (if
only two files match on a size) or compute checksums and use those to
exclude definitely unmatching files, and then use File.compare on what
(if anything) remains matching for that file size and checksum.

I assume something similar would work for finding duplicates in
general, not just comparing directories? (If there are likely to be
many matches on file size, then presumably one might as well compute
checksums for all files?)