Sven Johansson <sven_u_johansson / spray.se> wrote: > Robert Klemme wrote: > > Thank you for your response! Quick and clarifying at the same time. You're welcome! <snip/> > which clearly isn't good. However, both your suggestested alternatives > above work just fine. It would seem that binary mode really is a must > on Win32 - exhanging 'rb' for 'r' in those suggestions gives me the > hash repeat problem again. Good to know. When calculating the hash digest of a file binary mode is really the only reasonable thing to do it. Guess you just found another reason. :-) >> It's not completely clear to me what you want to do here. >> Apparently you check a number of audio files and shove them >> somewhere else based on some criterion. What's the aim of doing >> this? > > Oh, it works as it's supposed to do, so I'm not really trying to debug > it. It takes the hashes of all the files in a directory, compares them > to a global list of hashes, appends the new unique hashes to that list > and moves the corresponding files someplace, moves files that already > have "their" hashes in the list someplace else. The rest is just > morphing file names. > > I was looking for more input along the line of "that's not how we do > it in ruby - this is how we would express this particular sort of > statement". Yes, I was aware of that. I just wanted to know the purpose of the code so I might be able to make more appropriate statements. :-) > I realise that the first thing I should do is probably to read the > files by block instead of slurping them in wholesale, and that I would > be far better off maintainging the global list of hashes in a DB > instead of in a text file. I'll try my hands at the first, now that > I've gotten the hash and filehandle issue resolved above... as for the > second, taking a peek at this group reveals that making ruby talk with > mysql on Win32 isn't for the faint of heart, so I'll let that be for > now. The easiest way to store some arbitrary Ruby structure is to use YAML or Marshal. I'd probably do something like this: REPO_FILE = "repo.bin".freeze class Repository attr_accessor :main_dir, :duplicate_dir, :extensions def initialize(extensions = %w{mp3 ogg}) @extension = extensions @repository = {} end def process_dir(dir) # find all files with the extensions we support Dir[File.join(dir, "*.{#{extensions.join(',')}}")].each do |f| process_file( File.join(dir, f) ) end end def process_file(file) digest = digest(file) name = @repository[digest] if name target = duplicate_dir # ... else target = main_dir # ... end FileUtils.cp( file, File.join( target, File.basename( file ) ) ) end def digest(file) Digest::MD5.hexdigest( File.open(file, 'rb') {|io| io.read}) end def self.load(file) File.open(file, 'rb') {|io| Marshal.load(io)} end def save(file) File.open(file, 'wb') {|io| Marshal.dump(self, io)} end end repo = begin Repository.load( REPO_FILE ) rescue Exception => e # not there => create r = Repository.new r.main_dir = "foo" r.duplicate_dir = "bar" r end ARGV.each {|dir| repo.process_dir(dir)} repo.save( REPO_FILE ) The main point being here to encapsulate certain functionality into methods of their own. This greatly increases readability and reusability. Kind regards robert