On Oct 31, 2005, at 11:59 AM, Ara.T.Howard wrote:

>> If you meant to leave the \r in, it gets more complicated.  My  
>> code says that's ["\r", "\r"].
>
> hmmm.  that can't be right.  not matter what - a \r or \n that is  
> not between
> quotes cannot be part of a cell:
>
>   field = (escaped / non-escaped)
>
>   escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
>
>   non-escaped = *TEXTDATA
>
> so comma, \r, \n, and "  can __never__ be part of a cell unless  
> double quoted.

Ah, I forgot to glance at the grammar.  The above pretty much proves  
(to me), that we're looking at malformed CSV.

That's good news, because it means whatever we decide is right.  ;)   
We're off the map here anyway.

To me, it's a decision about what you consider \r to be.  The current  
CSV seems to call it a line ending.  I'm more inclined to consider it  
whitespace.  If I found a \v (vertical tab), I would leave it in the  
field because spaces are a part of the field.  I feel the same about \r.

I also would rather avoid throwing an Exception here.  "Be liberal in  
what you accept," as the saying goes.  Stripping it seems against the  
spirit of the format, so I guess that leaves calling it a line ending  
or whitespace.

If I call it a line ending, as CSV seems to, does that mean I need to  
start reading character by character?  I will then be looking for a \r 
\n|\r|\n, line ending, right?  I think that rules out gets().  I  
suspect that would be a massive speed hit, and I think our main focus  
was to be faster.

Anyway, those are my thoughts.  Please everyone speak up.  I'm  
building this library for you and I would hate to force my decisions  
on you.

James Edward Gray II