On 03/29/2012 07:55 AM, Robert Klemme wrote: > Jeremy Bopp wrote in post #1053841: >> On 03/28/2012 04:25 PM, Jan E. wrote: >>> file.print *lines >>> end >>> >>> Yeah, this *is* ugly. I wonder why Ruby cannot handle that itself. >> >> In Ruby 1.9, which the OP is using, File.readlines /can/ handle this >> better. You can specify the mode in which to open the file directly as >> a hash option. > > Using #readlines to copy a file identically is the wrong tool IMHO. From the OP's example, it appears that copying the file identically is not the desire. >> Or is the solution "ugly" because you have to manually specify binary >> mode when opening files? > > I'd rather do it with blocks of fixed length for efficiency reasons: > > File.open "oldf.txt", 'rb' do |io_in| > File.open "newf.txt", 'wb' do |io_out| > buffer = "" > > while io_in.read(1024, buffer) > io_out.write(buffer) > end > end > end > > But what about the dups? What constitutes a duplicate? If it is just > raw content, you could use "sort -u" (standalone command). Again from the original example, the records to compare for uniqueness are simple lines. Of course that simplicity belies the issue of line endings. ;-) Also, the OP appears to be running on Windows, so "sort -u" is not available out of the box. -Jeremy