On 03/29/2012 07:55 AM, Robert Klemme wrote:
> Jeremy Bopp wrote in post #1053841:
>> On 03/28/2012 04:25 PM, Jan E. wrote:
>>>   file.print *lines
>>> end
>>>
>>> Yeah, this *is* ugly. I wonder why Ruby cannot handle that itself.
>>
>> In Ruby 1.9, which the OP is using, File.readlines /can/ handle this
>> better.  You can specify the mode in which to open the file directly as
>> a hash option.
> 
> Using #readlines to copy a file identically is the wrong tool IMHO.

From the OP's example, it appears that copying the file identically is
not the desire.

>> Or is the solution "ugly" because you have to manually specify binary
>> mode when opening files?
> 
> I'd rather do it with blocks of fixed length for efficiency reasons:
> 
> File.open "oldf.txt", 'rb' do |io_in|
>   File.open "newf.txt", 'wb' do |io_out|
>     buffer = ""
> 
>     while io_in.read(1024, buffer)
>       io_out.write(buffer)
>     end
>   end
> end
> 
> But what about the dups?  What constitutes a duplicate?  If it is just
> raw content, you could use "sort -u" (standalone command).

Again from the original example, the records to compare for uniqueness
are simple lines.  Of course that simplicity belies the issue of line
endings. ;-)

Also, the OP appears to be running on Windows, so "sort -u" is not
available out of the box.

-Jeremy