Yes, there are three types of files: daily, monthly, and
yearly.  In processing the data, I typically take the last 120 days
or so and load in all the 'ticks' (each line I call a 'tick') for
all contracts (the first field is the contract name) that exist
who have a tick in today's daily file.  The first optimization is
to ignore any new contracts found in any other files that weren't
found in today's daily file.  This alone knocks off a huge amount
of processing time.

	Here's a dump of my data directory:
  Directory of C:\src\barchart\Data

09/07/2005  09:26 AM            79,633 mrn09015.txt
09/07/2005  09:26 AM            79,526 mrn09025.txt
09/07/2005  09:26 AM            79,282 mrn09065.txt
09/08/2005  02:20 PM            79,700 mrn09075.txt
09/08/2005  07:36 PM            80,014 mrn09085.txt
09/09/2005  05:24 PM            80,092 mrn09095.txt
09/12/2005  04:13 PM            78,405 mrn09125.txt
09/14/2005  12:33 AM            80,065 mrn09135.txt
09/14/2005  04:58 PM            80,804 mrn09145.txt
09/15/2005  09:24 PM            80,344 mrn09155.txt
09/16/2005  05:20 PM            80,318 mrn09165.txt
05/01/2003  11:03 AM         1,380,685 mrnapr03.txt
05/01/2004  11:07 AM         1,554,968 mrnapr04.txt
04/30/2005  07:26 PM         1,573,078 mrnapr05.txt
08/30/2003  10:48 AM         1,443,433 mrnaug03.txt
09/01/2004  11:19 AM         1,632,816 mrnaug04.txt
09/01/2005  06:23 PM         1,806,148 mrnaug05.txt
01/01/2004  10:55 AM         1,479,529 mrndec03.txt
01/01/2005  11:31 AM         1,643,217 mrndec04.txt
03/01/2003  06:14 PM         1,285,420 mrnfeb03.txt
02/28/2004  11:01 AM         1,435,698 mrnfeb04.txt
03/01/2005  10:19 AM         1,443,562 mrnfeb05.txt
02/01/2003  06:16 PM         1,405,893 mrnjan03.txt
01/31/2004  10:58 AM         1,466,198 mrnjan04.txt
02/01/2005  06:48 PM         1,475,062 mrnjan05.txt
08/01/2003  10:45 AM         1,533,833 mrnjul03.txt
07/31/2004  11:15 AM         1,577,137 mrnjul04.txt
07/30/2005  06:26 PM         1,599,177 mrnjul05.txt
07/01/2003  10:43 AM         1,464,763 mrnjun03.txt
07/01/2004  11:13 AM         1,586,983 mrnjun04.txt
07/01/2005  09:39 AM         1,645,860 mrnjun05.txt
04/01/2003  03:10 PM         1,405,070 mrnmar03.txt
04/01/2004  11:04 AM         1,719,759 mrnmar04.txt
04/01/2005  07:17 PM         1,650,906 mrnmar05.txt
05/31/2003  08:32 AM         1,409,874 mrnmay03.txt
06/01/2004  11:09 AM         1,492,088 mrnmay04.txt
06/01/2005  10:55 AM         1,575,500 mrnmay05.txt
11/29/2003  10:53 AM         1,328,361 mrnnov03.txt
12/01/2004  11:28 AM         1,570,140 mrnnov04.txt
11/01/2003  03:16 PM         1,576,191 mrnoct03.txt
10/30/2004  11:25 AM         1,558,689 mrnoct04.txt
10/01/2003  10:51 AM         1,479,491 mrnsep03.txt
10/01/2004  11:22 AM         1,599,579 mrnsep04.txt
12/30/2000  11:40 AM        12,909,715 newmrn00.txt
01/01/2002  12:04 PM        15,083,716 newmrn01.txt
01/01/2003  12:33 PM        16,715,817 newmrn02.txt
01/01/1991  09:09 AM         5,815,310 newmrn90.txt
01/01/1992  09:19 AM         6,618,404 newmrn91.txt
01/01/1993  09:31 AM         7,191,765 newmrn92.txt
01/01/1994  09:43 AM         7,731,938 newmrn93.txt
12/31/1994  09:57 AM         8,467,874 newmrn94.txt
12/30/1995  10:11 AM         8,769,054 newmrn95.txt
01/01/1997  10:27 AM         9,489,616 newmrn96.txt
01/01/1998  10:43 AM        10,032,432 newmrn97.txt
01/01/1999  11:00 AM        10,717,629 newmrn98.txt
01/01/2000  11:19 AM        11,632,064 newmrn99.txt
               56 File(s)    180,852,625 bytes

	Once all the data is loaded in that exists in today's
daily file, up to a total of about 120 ticks for each contract,
then the file processing is done, and I can start playing
with the numbers.

	I have a Ruby class called 'Contract' that has a
singleton called 'parseFile(name)' that is written in C++ to parse
the files.  I have a C++ routine called 'getTicks(date=nil,days=1)'
that returns an array of ticks in order, starting at 'date' (or
today's date if nil, and returns an array of size 'days' for that
number of ticks.

	Then I've got a bunch more routines to play with the data
that all call 'getTicks'.  Here's a small sample of the routines:
   def getOpen(date=nil, days=1)
     getTicks(date, days).collect {|t| t.open }
   end
   def getHigh(date=nil, days=1)
     getTicks(date, days).collect {|t| t.high }
   end
   def getLow(date=nil, days=1)
     getTicks(date, days).collect {|t| t.low }
   end
   def getClose(date=nil, days=1)
     getTicks(date, days).collect {|t| t.close }
   end
   def getChange(date=nil, days=1)
     close = getClose(date, days+1)
     diff  = close.dup
     diff.shift
     diff.each_with_index {|val, i|
       diff[i] = val - close[i]
     }
     diff
   end

	And that's about all there is to it.  Of course, I
could have written all the data processing stuff in C++ too,
but keeping it in Ruby gives me a lot more flexibility in
the things I can do, and now that the data load times are
very tollerable because ofC++, I can play around and change
things really quickly.

	I hope that helps.
-- Glenn

Robert Klemme wrote:
>>A62005U,050909,0.7726,0.7758,0.7703,0.7737,12366,0
> 
> 
> Ok, at least we can generate a large data set with this kind of
> information.  You talked about repetitions etc.  Are there multiple
> entries per contract?
> 
> Plus, we need to know what exactly you want to do with the data.
> 
> Kind regards
> 
>     robert
>