Bill Kelly <billk / cts.com> wrote: > From: "rtilley" <rtilley / vt.edu> >> >> I'm calculating md5 checksums on very large files (2 GB). This is a >> safe way to do so, right? Also... is the file closed when the block >> exits? I'm using 'rb' as this is used on Windows and Linux computers. >> >> md5 = Digest::MD5.new() >> File.open(file, 'rb').each {|line| md5.update(line)} > > Hi - does the file really contain text lines? Or is it a file > full of binary data. If it's a binary file, there may be no > guarantee the whole thing isn't one very long "line". In that > case I'd recommend reading it in chunks. > > Untested: > > md5 = Digest::MD5.new() > File.open(file, 'rb') do |io| > while (buf = io.read(4096)) && buf.length > 0 > md5.update(buf) > end > end io.read will return nil at EOF so your test for positive length is basically obsolete. Also, for reasons of error checking I'd place the digest creation inside the block because then the digest is never created if the file cannot be opened: md5 = File.open(file, 'rb') do |io| dig = Digest::MD5.new while (buf = io.read(4096)) dig.update(buf) end dig end If you want to increase efficiency, you can do this, which will prevent new strings to be created as buffers all the time: md5 = File.open(file, 'rb') do |io| dig = Digest::MD5.new buf = "" while io.read(4096, buf) dig.update(buf) end dig end Here's another nice variant: md5 = File.open(file, 'rb') do |io| dig = Digest::MD5.new buf = "" dig.update(buf) while io.read(4096, buf) dig end Kind regards robert