Robert Klemme wrote:
> On 06.07.2009 21:47, Greg Willits wrote:
> 
>>> What about storing all file offsets
>>> in an Array and write it to a file
>> 
>> That's a possibility that would be an easy retrofit and probably w/o 
>> breaking the current interfaces. It would use a little more RAM, but 
>> would eliminate calculating that offset each time during reads. That's 
>> not a real pinch point, but worth an experiment just because it's a 
>> simpler concept to recognize.
> 
> I'd be interested to learn the outcome of that exercise.


In my original system, I used the row lengths and the number of rows per 
"set" to pre-calculate a lookup of offsets into the start of each set. 
Upon recieving a request for record 623,456 I first had to determine 
which set that would be in (4th set based on 200,000 per set). So I'd 
use a floor calc to determine that. Then subtract that factor of rows 
from the 623,456, then use the basic row_length multiplier. So, there 
were a few calcs involved.

It looked like this:

set_indx      = (row_number.to_f / @set_size.to_f).floor
set_sub_indx  = row_number - (@set_size * set_indx)
record_start  = @set_offsets[set_indx] + (set_sub_indx * 
@record_lengths[set_indx])

By saving the row-specific offsets as an array, that is simplified to :

record_start = @record_offsets[row_number]

This turned out to make the total process of fetching rows 15% faster. 
(roughly 32 seconds vs 38 seconds for reading 1.4 million rows on my dev 
system).

I already index the data itself using hashes (which work very, very 
fast) for aggregation lookups, so this concept is quite parallel to 
other ways the code works, and is a worthwhile change to make -- so 
thanks for that idea.

-- gw


-- 
Posted via http://www.ruby-forum.com/.