Robert Klemme wrote:

> The easiest way to store some arbitrary Ruby structure is to use YAML or
> Marshal.  I'd probably do something like this:
>
> REPO_FILE = "repo.bin".freeze
>
> class Repository
>   attr_accessor :main_dir, :duplicate_dir, :extensions
>
>   def initialize(extensions = %w{mp3 ogg})
>     @extension = extensions
>     @repository = {}
>   end
>
>   def process_dir(dir)
>     # find all files with the extensions we support
>     Dir[File.join(dir, "*.{#{extensions.join(',')}}")].each do |f|
>       process_file( File.join(dir, f) )
>     end
>   end
>
>   def process_file(file)
>     digest = digest(file)
>     name = @repository[digest]
>
>     if name
>       target = duplicate_dir
>       # ...
>     else
>       target = main_dir
>       # ...
>     end
>
>     FileUtils.cp( file, File.join( target, File.basename( file ) ) )
>   end
>
>   def digest(file)
>     Digest::MD5.hexdigest( File.open(file, 'rb') {|io| io.read})
>   end
>
>   def self.load(file)
>     File.open(file, 'rb') {|io| Marshal.load(io)}
>   end
>
>   def save(file)
>     File.open(file, 'wb') {|io| Marshal.dump(self, io)}
>   end
> end
>
>
> repo = begin
>   Repository.load( REPO_FILE )
> rescue Exception => e
>   # not there => create
>   r = Repository.new
>   r.main_dir = "foo"
>   r.duplicate_dir = "bar"
>   r
> end
>
> ARGV.each {|dir| repo.process_dir(dir)}
>
> repo.save( REPO_FILE )
>
> The main point being here to encapsulate certain functionality into methods
> of their own.  This greatly increases readability and reusability.

Very informative indeed, if perhaps more than a bit humbling! Thank you
again.

One last question, then - while the style above is easily more readable
and quite... enjoyable, for lack of a better word, to read, how does
Ruby measure up when it comes to passing all those variables around to
functions (method calls) all the time? Do I lose significant
performance by having method calls in inner loops? And no, I can hear
it already; "Dude, you traverse big directories, do calculations on a
big number of big files and push the filesystem to it's limits copying
them like there was no tomorrow already..." Obviously, it doesn't
matter here. But would it matter if one was wrtiting, say, a port
listener or some other reasonably performance critical application?