On Sat, 10 Sep 2005, Glenn M. Lewis wrote:

> Thanks a bunch, Hugh and Eric!  The combination of your
> two suggestions sped it up quite a bit.
>
> I don't agree with Robert, though... I have written many
> parsers in C++ (and before that, C) that could soak up
> all the data that I'm reading in less than a second whereas
> this is taking approximately 9 minutes in Ruby.  With the
> recommendations of Hugh and Eric, it is now down to about
> 5 minutes, or almost a factor of 2 speedup.
>
> I would really like an order of magnitude or more, but
> I would definitely have to write it in a compiled language.
> I've done this before with Ruby and C++ using SWIG, but
> this particular one seemed really challenging when having
> Ruby call C++ which would then call Ruby...
>
> My last project with Ruby/C++/SWIG had Ruby calling C++
> but C++ kept all the data structures internally without
> ever calling Ruby, and this was *much* easier... but not
> as flexible as I would like for this case.
>
> I may have to rewrite this whole puppy in D if I'm going
> to get parsing times under one second.  Using C++ and STL
> for its map containers is a royal nuisance, but D has
> built-in associative arrays.  Or maybe I should try Perl
> or Python and see how their file parsing speeds compare.
>
> Oh, and to answer Hugh's question, it is extremely rare
> that a line would have less than 8 fields... sometimes
> the last line of the file has only a ^Z on it.
>
> Thanks again for your help!  I appreciate it.
> -- Glenn

can you send a sample data set (contact me offline if you wish) and expected
time to parse and let us have a crack?  those times sounds distressing - is
your data HUGE?

cheers.

-a
-- 
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| Your life dwells amoung the causes of death
| Like a lamp standing in a strong breeze.  --Nagarjuna
===============================================================================