On Sep 18, 2008, at 10:16 PM, Michael Selig wrote:

> IO#gets and IO#each_line are typically what I am using. Can't you  
> use them for CSV?

Yes, and the primary parser does.

The code I mentioned it part of the auto line ending detection feature  
the library provides.  I read ahead in the data, 1024 bytes at a time,  
hunting for common line endings.  When I see a \r at the end of the  
String I also read one more byte (should be character, of course) to  
see if a \n follows.

> The line terminator ("sep") parameter can be an arbitrary string,  
> and the "limit" parameter, although specified in bytes, will always  
> round to a character boundary. Matz confirmed this behaviour. I  
> think you can even set the line terminator to a null string, and  
> just use the limit, which means that gets works almost like read,  
> except it never splits characters.

That's very interesting.  I knew about the separator String and the  
limit both, but not this magic behavior of reading at character  
boundaries.

I would switch to this feature, but the problem code is a little more  
complex than I first let on.  It has been patched to support working  
with a Zlib::GzipReader and I'm betting that isn't going to support  
this special read behavior.

Interestingly, I was having basically the same idea for fixing my  
problem.  I thought I would read() some byte count and try a harmless  
operation on the end of the String that would trigger the bad data  
error, if it is there.  While it is, I would keep reading one more  
byte.  Hopefully that's safe enough.

James Edward Gray II