-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, James Edward Gray II wrote: >> Of course we usually start CSV parser with String#split, String#scan, >> then RegExp. Me, too. The initial ruby book published in 1999 contains >> 4 types of csv_split methods. Matz, too. >> >> The reasons why the current csv.rb has its state machine are; >> * parsing IO without exhausting memory > > I'm not sure I understand this one. If you run into a construct like: > > field1,feild2,"... a whole lot of data that never ends with another > closing quote... > > You will need to keep reading looking for that closing quote just like > the Regexp version will, right? Right. Hmm. My csv.rb may need field-size limit to avoid exhausting memory when broken CSV format comes. >> * coverage measurement > > Forgive my ignorance. What does this mean? I meant measuring code coverage for quality assurance. I want to know each state transition is needed and worked correctly while parsing. Code coverage help to see that. Searching ruby-talk with "csv" and "coverage" you'll find some articles about code (statement) coverage. > I do very much appreciate your work though. I've used your CSV module > many, many times. Honestly, I've never had a problem with its speed, > but I don't parse a lot of massive CSV. It's clear some people do > though and the truth is, I believe we can give it a boost. We'll see... Thanks. I'm looking for seeing it. Beside this, I have C version of that state machine but unfortunately it cannot be published. Once it hit the ceiling of my allowable performance limit, I'll rewrite that part in C. It must be a rubyish approach. Regards, // NaHi -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (Cygwin) iD8DBQFDatJIf6b33ts2dPkRAi06AJ9z63td91WXFpo2Wk1KQNcy7RUuLACeKJL5 yMlcJlfajRXVoDfwJt8id44= =MqXT -----END PGP SIGNATURE-----