On Nov 29, 2006, at 11:29 AM, Paul Lutus wrote:

> Will Jessop wrote:
>
>> Paul Lutus wrote:
>>>> CSV.open(infile, 'r') do |row|
>>>
>>> Why are you using CSV for this? You aren't parsing the lines into  
>>> fields,
>>> so the fact that they contain CSV content has no bearing on the  
>>> present
>>> task. Your goal is to split the input file into groups of lines  
>>> delimited
>>> by linefeeds, not fields delimited by commas.
>>>
>>> Why not simply read lines from the input file and write them to a  
>>> series
>>> of output files, until the input file is exhausted?
>>
>> Because CSV understands csv data with embedded newlines:
>
> A plain-text CSV file uses linefeeds as record delimiters. A  
> program that
> uses "readline" or "gets" splits the records just as a sane CSV parser
> would. And IMHO a CSV file should never, ever have linefeeds  
> embedded in
> fields.

Your opinion doesn't make you right on this one.  The CSV RFC clearly  
defines handling for carriage-returns and linefeeds.  They certainly  
are allowed in fields.  Here is a link to the document, in case you  
want to read up:

http://www.ietf.org/rfc/rfc4180.txt

Not to use a CSV parser on this task would be shooting yourself in  
the foot.  The result using a simple File object would be broken and,  
much worse, it might look OK for a while.  You just can't be sure you  
are never going to split a CSV file that has an embedded linefeed in  
it (especially since that's perfectly legal), and when you do you  
will be responsible for destroying data.  There's just no reason for  
that.

I know you're a don't-use-a-library guy and you know I disagree.   
This is the reason why.  The edge cases will get you every time.

James Edward Gray II