Robert Klemme wrote: > The easiest way to store some arbitrary Ruby structure is to use YAML or > Marshal. I'd probably do something like this: > > REPO_FILE = "repo.bin".freeze > > class Repository > attr_accessor :main_dir, :duplicate_dir, :extensions > > def initialize(extensions = %w{mp3 ogg}) > @extension = extensions > @repository = {} > end > > def process_dir(dir) > # find all files with the extensions we support > Dir[File.join(dir, "*.{#{extensions.join(',')}}")].each do |f| > process_file( File.join(dir, f) ) > end > end > > def process_file(file) > digest = digest(file) > name = @repository[digest] > > if name > target = duplicate_dir > # ... > else > target = main_dir > # ... > end > > FileUtils.cp( file, File.join( target, File.basename( file ) ) ) > end > > def digest(file) > Digest::MD5.hexdigest( File.open(file, 'rb') {|io| io.read}) > end > > def self.load(file) > File.open(file, 'rb') {|io| Marshal.load(io)} > end > > def save(file) > File.open(file, 'wb') {|io| Marshal.dump(self, io)} > end > end > > > repo = begin > Repository.load( REPO_FILE ) > rescue Exception => e > # not there => create > r = Repository.new > r.main_dir = "foo" > r.duplicate_dir = "bar" > r > end > > ARGV.each {|dir| repo.process_dir(dir)} > > repo.save( REPO_FILE ) > > The main point being here to encapsulate certain functionality into methods > of their own. This greatly increases readability and reusability. Very informative indeed, if perhaps more than a bit humbling! Thank you again. One last question, then - while the style above is easily more readable and quite... enjoyable, for lack of a better word, to read, how does Ruby measure up when it comes to passing all those variables around to functions (method calls) all the time? Do I lose significant performance by having method calls in inner loops? And no, I can hear it already; "Dude, you traverse big directories, do calculations on a big number of big files and push the filesystem to it's limits copying them like there was no tomorrow already..." Obviously, it doesn't matter here. But would it matter if one was wrtiting, say, a port listener or some other reasonably performance critical application?