On Oct 31, 2005, at 11:59 AM, Ara.T.Howard wrote: >> If you meant to leave the \r in, it gets more complicated. My >> code says that's ["\r", "\r"]. > > hmmm. that can't be right. not matter what - a \r or \n that is > not between > quotes cannot be part of a cell: > > field = (escaped / non-escaped) > > escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE > > non-escaped = *TEXTDATA > > so comma, \r, \n, and " can __never__ be part of a cell unless > double quoted. Ah, I forgot to glance at the grammar. The above pretty much proves (to me), that we're looking at malformed CSV. That's good news, because it means whatever we decide is right. ;) We're off the map here anyway. To me, it's a decision about what you consider \r to be. The current CSV seems to call it a line ending. I'm more inclined to consider it whitespace. If I found a \v (vertical tab), I would leave it in the field because spaces are a part of the field. I feel the same about \r. I also would rather avoid throwing an Exception here. "Be liberal in what you accept," as the saying goes. Stripping it seems against the spirit of the format, so I guess that leaves calling it a line ending or whitespace. If I call it a line ending, as CSV seems to, does that mean I need to start reading character by character? I will then be looking for a \r \n|\r|\n, line ending, right? I think that rules out gets(). I suspect that would be a massive speed hit, and I think our main focus was to be faster. Anyway, those are my thoughts. Please everyone speak up. I'm building this library for you and I would hate to force my decisions on you. James Edward Gray II