On Tue, 1 Nov 2005, James Edward Gray II wrote: > On Oct 29, 2005, at 12:11 PM, Ara.T.Howard wrote: > >> it may or may not be tricky to get these failing cases working though: > > I'm building a test suite for the new library, including all the edge cases > you posted. Can you tell me what exactly you meant to check with the > following test: > > [ > %( \r,"\r" ), > [nil,"\r"] > ], > > You are strip()ing those before testing them, so the method is actually fed > %Q{,"\r"}. The first \r is trimmed, as you can see. With that input, we all > agree on your answer. ah... my mistake. i meant to test %(\r,"\r"), so, some whitespace noise (blank line, etc), following by a record consisting of an empty cell and a cell containing \r. basically no chars like \r or \b may exist outside of double quotes. to handle this there are two options: through an error or ignore blank (all whitespace) lines. my feeling is that a file like this name,age jim,16 john,32 should parse - so i'm in the 'ignore' open whitespace camp... but it's an opinion. > If you meant to leave the \r in, it gets more complicated. My code says > that's ["\r", "\r"]. hmmm. that can't be right. not matter what - a \r or \n that is not between quotes cannot be part of a cell: field = (escaped / non-escaped) escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE non-escaped = *TEXTDATA so comma, \r, \n, and " can __never__ be part of a cell unless double quoted. > CSV says that's []. I guess it is considering the \r a line end. I have a > hard time convincing myself that behavior is correct though. that's wrong too i think. > The way I read the RCF, the only valid line ending in the format is > \015\012. It also says that line endings must be enclosed in double quotes. not quite - it says that CR __or__ LF must be in double quotes. but a line ending is CRLF. so yes, a line ending must be CRLF, but __any__ CR or LF must be double quoted. this is really making me thing that a bare CR (\r) should be a syntax error. > leaves me feeling that either a field containing \r becomes "\r", just as my > code says, because it's not a line ending, or it's malformed CSV. We still > need a preferred way to handle it though. hmmm. what about bare commas, newlines, and quote marks then? why treat \r specially? i say any bad char is a syntax error. including bare \r, command, \n, or ". > I'm inclined to ignore the RCF here and call Ruby's platform dynamic \n a > line ending. I'm also inclined to say that means \r isn't (on all platforms > where \n != \015, which includes Windows and Unix) and my code is correct, > it's a "\r" field. > > Object now if you think I'm crazy... ;) not at all. i don't know what the right thing to do is - but you've got my 2cts. anyone else? cheers. -a -- =============================================================================== | email :: ara [dot] t [dot] howard [at] noaa [dot] gov | phone :: 303.497.6469 | anything that contradicts experience and logic should be abandoned. | -- h.h. the 14th dalai lama ===============================================================================