On Tue, 24 Oct 2006 khaines / enigo.com wrote:

> On Tue, 24 Oct 2006, Joel VanderWerf wrote:
>>> The speed of the approach is tied very directly to the speed of the IO 
>>> system, but it's relatively fast and very portable so long as your file 
>>> naming scheme is itself portable.
>> 
>> Kirk,
>> 
>> Did you get a feeling for what the break-even point was? In other words,
>> how many files do you need before the depth 2 scheme beats a single flat
>> dir?
>> 
>> Playing around with this idea in fsdb, with 100_000 files, access times 
>> seem to get worse as depth varies from 0 to 2. Creation time was better at 
>> depth==1 than in either of the other cases.
>
> Not really.  The IO performance on my test machine is so bad that it didn't 
> seem to make a significant difference.
>
> You have me curious now, though, so I may go test it on a box with better IO 
> and see what I can find out.

doesn't ext3 use btree under the hood?  if so you'd basically just be adding a
method call per level since the lookup of 100_000 is still quite fast - on the
otherhand anypeice of code that puts 100_000 file in one directory is evil!

i guess what i'm driving at is that what you may be comparing is method call
overhead vs big-O operation and not IO speed.  i had this happen to me once
before looking for the fastest way to search a huge in memory table.  i tried
gperf, std:: crap, sqlite in mem, bdb in mem, etc.  in the end c's built in
bsearch, including sort, was way faster for even 10 or 20 searches.  the
reason was that, after the initial hit of sorting, the function call overhead
for bsearch, which just flips pointers all over the place, was far less that
for that of gperf, which made quite a few function calls.  anyhow, it was as
suprise to me that a O(log n) solution ran much faster in the real world
than an O(1) one with prety dang big data sets...

food for thought - be curious to hear what you find out.

;-)

-a
-- 
my religion is very simple.  my religion is kindness. -- the dalai lama