Hi Robert!

	VERY IMPRESSIVE!!!  After tweaking your two regexp's from
(\w\d+\w) to (\w+\d+\w) (because 'AD2005U' is also a valid contract),
I got this:

    0.047s read ref
    0.062s read file c:/src/barchart/Data/mrn09215.txt
    0.047s read file c:/src/barchart/Data/mrn09205.txt
    0.063s read file c:/src/barchart/Data/mrn09195.txt
    0.078s read file c:/src/barchart/Data/mrn09165.txt
    0.062s read file c:/src/barchart/Data/mrn09155.txt
    0.047s read file c:/src/barchart/Data/mrn09145.txt
    0.047s read file c:/src/barchart/Data/mrn09135.txt
    0.078s read file c:/src/barchart/Data/mrn09125.txt
    0.063s read file c:/src/barchart/Data/mrn09095.txt
    0.172s read file c:/src/barchart/Data/mrn09085.txt
    0.109s read file c:/src/barchart/Data/mrn09075.txt
    0.094s read file c:/src/barchart/Data/mrn09065.txt
    0.047s read file c:/src/barchart/Data/mrn09025.txt
    0.062s read file c:/src/barchart/Data/mrn09015.txt
    1.547s read file c:/src/barchart/Data/mrnaug05.txt
    1.531s read file c:/src/barchart/Data/mrnjul05.txt
    1.141s read file c:/src/barchart/Data/mrnjun05.txt
    1.375s read file c:/src/barchart/Data/mrnmay05.txt
    1.734s read file c:/src/barchart/Data/mrnapr05.txt
    0.907s finished post processing
    9.313s total

1415 total contracts
136164 total ticks (averages out to 96 ticks per contract)

	I ought to point out that the 'ref' file is actually not
processed in this case (meaning that its ticks are not recorded),
but that would probably add on another 0.078s or so.

	Also, I'm expecting more average ticks than that, so
I would have to figure out why it is missing some ticks... but
it is probably just a minor regexp tweak.

	Another minor note is that volume and openInterest were
not recorded, but that is a very minor thing to add on.

	So now the score is:
Glenn's Ruby-Only: ~29 seconds
Robert's Ruby-Only: ~9 seconds
Glenn's Ruby/C++: ~2 seconds

	Great job, Robert!  Now, to answer your questions below...

Robert Klemme wrote:
> Glenn,
> 
> here's my first shot.  It doesn't seem very fast but then again there 
> are still some uncertainties:
> 
> - How many contracts are typically in a reference day's set?

C:\>wc \src\barchart\data\mrn09225.txt
    1712    1712   79305    \src\barchart\data\mrn09225.txt
About 1700, but only 1415 matched the (\w+\d+\w) regexp, and those
are the only ones I care about.

> - How many contracts are there in total?

	I don't know... probably over 5000, depending on how far you
go back because older contract expire and newer contracts start up.
That last letter represents the month of the contract:
F=Jan,G=Feb,H=Mar,J=Apr,K=May,M=Jun,N=Jul,Q=Aug,U=Sep,V=Oct,X=Nov,Z=Dec

> - How many percent of the reference contracts are present in an average 
> file?

	As you start out, nearly 100%... then as you go back to earlier
and earlier dates, the reference contracts start to die out, and it may
drop down to around 90-95% or so... but in the example above, I'm only
going back around 100 days.

> - How do dates relate to files? (I assumed a file per day plus I used 
> synthetic dates; see the generator script)

	Well, there are three types of files: daily updates, monthly updates,
and yearly updates.  So far, I haven't needed to go back to any of the
yearly updates in any of the processing I've done.  But suffice to say
that a monthly update file is basically the 'cat' (concatenation) together
of all the daily files for that month, and the yearly is the cat of all
monthly files for that year.

> For 50 files with 212998930 bytes this took 476.511s total on my machine 
> (2.34s/MB).  Maybe you just throw it at your data set and see how it 
> works out.
> 
> Kind regards
> 
>    robert

	Nice job!  Thanks, Robert!
-- Glenn