Sven Johansson <sven_u_johansson / spray.se> wrote:
> Robert Klemme wrote:
>
> Thank you for your response! Quick and clarifying at the same time.

You're welcome!

<snip/>

> which clearly isn't good. However, both your suggestested alternatives
> above work just fine. It would seem that binary mode really is a must
> on Win32 - exhanging 'rb' for 'r' in those suggestions gives me the
> hash repeat problem again. Good to know.

When calculating the hash digest of a file binary mode is really the only 
reasonable thing to do it.  Guess you just found another reason. :-)

>> It's not completely clear to me what you want to do here.
>> Apparently you check a number of audio files and shove them
>> somewhere else based on some criterion.  What's the aim of doing
>> this?
>
> Oh, it works as it's supposed to do, so I'm not really trying to debug
> it. It takes the hashes of all the files in a directory, compares them
> to a global list of hashes, appends the new unique hashes to that list
> and moves the corresponding files someplace, moves files that already
> have "their" hashes in the list someplace else. The rest is just
> morphing file names.
>
> I was looking for more input along the line of "that's not how we do
> it in ruby - this is how we would express this particular sort of
> statement".

Yes, I was aware of that.  I just wanted to know the purpose of the code so 
I might be able to make more appropriate statements. :-)

> I realise that the first thing I should do is probably to read the
> files by block instead of slurping them in wholesale, and that I would
> be far better off maintainging the global list of hashes in a DB
> instead of in a text file. I'll try my hands at the first, now that
> I've gotten the hash and filehandle issue resolved above... as for the
> second, taking a peek at this group reveals that making ruby talk with
> mysql on Win32 isn't for the faint of heart, so I'll let that be for
> now.

The easiest way to store some arbitrary Ruby structure is to use YAML or 
Marshal.  I'd probably do something like this:

REPO_FILE = "repo.bin".freeze

class Repository
  attr_accessor :main_dir, :duplicate_dir, :extensions

  def initialize(extensions = %w{mp3 ogg})
    @extension = extensions
    @repository = {}
  end

  def process_dir(dir)
    # find all files with the extensions we support
    Dir[File.join(dir, "*.{#{extensions.join(',')}}")].each do |f|
      process_file( File.join(dir, f) )
    end
  end

  def process_file(file)
    digest = digest(file)
    name = @repository[digest]

    if name
      target = duplicate_dir
      # ...
    else
      target = main_dir
      # ...
    end

    FileUtils.cp( file, File.join( target, File.basename( file ) ) )
  end

  def digest(file)
    Digest::MD5.hexdigest( File.open(file, 'rb') {|io| io.read})
  end

  def self.load(file)
    File.open(file, 'rb') {|io| Marshal.load(io)}
  end

  def save(file)
    File.open(file, 'wb') {|io| Marshal.dump(self, io)}
  end
end


repo = begin
  Repository.load( REPO_FILE )
rescue Exception => e
  # not there => create
  r = Repository.new
  r.main_dir = "foo"
  r.duplicate_dir = "bar"
  r
end

ARGV.each {|dir| repo.process_dir(dir)}

repo.save( REPO_FILE )

The main point being here to encapsulate certain functionality into methods 
of their own.  This greatly increases readability and reusability.

Kind regards

    robert