-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

James Edward Gray II wrote:
>> Of course we usually start CSV parser with String#split, String#scan,
>> then RegExp.  Me, too.  The initial ruby book published in 1999  contains
>> 4 types of csv_split methods.  Matz, too.
>>
>> The reasons why the current csv.rb has its state machine are;
>>  * parsing IO without exhausting memory
> 
> I'm not sure I understand this one.  If you run into a construct like:
> 
> field1,feild2,"... a whole lot of data that never ends with another 
> closing quote...
> 
> You will need to keep reading looking for that closing quote just  like
> the Regexp version will, right?

Right.  Hmm.  My csv.rb may need field-size limit to avoid exhausting
memory when broken CSV format comes.

>>  * coverage measurement
> 
> Forgive my ignorance.  What does this mean?

I meant measuring code coverage for quality assurance.  I want to know
each state transition is needed and worked correctly while parsing.
 Code coverage help to see that.  Searching ruby-talk with "csv" and
"coverage" you'll find some articles about code (statement) coverage.

> I do very much appreciate your work though.  I've used your CSV  module
> many, many times.  Honestly, I've never had a problem with its  speed,
> but I don't parse a lot of massive CSV.  It's clear some  people do
> though and the truth is, I believe we can give it a boost.   We'll see...

Thanks.  I'm looking for seeing it.

Beside this, I have C version of that state machine but unfortunately it
cannot be published.  Once it hit the ceiling of my allowable
performance limit, I'll rewrite that part in C.  It must be a rubyish
approach.

Regards,
// NaHi
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Cygwin)

iD8DBQFDatJIf6b33ts2dPkRAi06AJ9z63td91WXFpo2Wk1KQNcy7RUuLACeKJL5
yMlcJlfajRXVoDfwJt8id44=
=MqXT
-----END PGP SIGNATURE-----