On Nov 3, 2005, at 12:20 AM, NAKAMURA, Hiroshi wrote: > Of course we usually start CSV parser with String#split, String#scan, > then RegExp. Me, too. The initial ruby book published in 1999 > contains > 4 types of csv_split methods. Matz, too. > > The reasons why the current csv.rb has its state machine are; > * parsing IO without exhausting memory I'm not sure I understand this one. If you run into a construct like: field1,feild2,"... a whole lot of data that never ends with another closing quote... You will need to keep reading looking for that closing quote just like the Regexp version will, right? > * coverage measurement Forgive my ignorance. What does this mean? > * fs and rs customize I've been looking into this for FasterCSV. The Regexp can have the field separator interpolated in and it should work for any value that isn't a " or part of the row separator. The row separator will just be fed to gets() so again, any sane value (including \r, \n, and \r\n) will work just fine. > You can parse IO with RegExp. IO#gets, scan with RegExp, concatenate > the rest with the next line and continue. It should be faster and it > can be clearer(?) than the current csv.rb. That's my hope. I'm working on it now. I do very much appreciate your work though. I've used your CSV module many, many times. Honestly, I've never had a problem with its speed, but I don't parse a lot of massive CSV. It's clear some people do though and the truth is, I believe we can give it a boost. We'll see... James Edward Gray II P.S. I am working on FasterCSV, but I'm swamped this week and off on vacation starting Monday. I'll get something out as soon as possible. Hopefully in about two more weeks...