On Nov 3, 2005, at 12:20 AM, NAKAMURA, Hiroshi wrote:

> Of course we usually start CSV parser with String#split, String#scan,
> then RegExp.  Me, too.  The initial ruby book published in 1999  
> contains
> 4 types of csv_split methods.  Matz, too.
>
> The reasons why the current csv.rb has its state machine are;
>  * parsing IO without exhausting memory

I'm not sure I understand this one.  If you run into a construct like:

field1,feild2,"... a whole lot of data that never ends with another  
closing quote...

You will need to keep reading looking for that closing quote just  
like the Regexp version will, right?

>  * coverage measurement

Forgive my ignorance.  What does this mean?

>  * fs and rs customize

I've been looking into this for FasterCSV.

The Regexp can have the field separator interpolated in and it should  
work for any value that isn't a " or part of the row separator.  The  
row separator will just be fed to gets() so again, any sane value  
(including \r, \n, and \r\n) will work just fine.

> You can parse IO with RegExp.  IO#gets, scan with RegExp, concatenate
> the rest with the next line and continue.  It should be faster and it
> can be clearer(?) than the current csv.rb.

That's my hope.  I'm working on it now.

I do very much appreciate your work though.  I've used your CSV  
module many, many times.  Honestly, I've never had a problem with its  
speed, but I don't parse a lot of massive CSV.  It's clear some  
people do though and the truth is, I believe we can give it a boost.   
We'll see...

James Edward Gray II

P.S.  I am working on FasterCSV, but I'm swamped this week and off on  
vacation starting Monday.  I'll get something out as soon as  
possible.  Hopefully in about two more weeks...