On Sat, 12 Aug 2006, Francis Cianfrocca wrote:

> With all the times I've reinvented this wheel, I've never tried
> storing millions of data elements each in its own file. Do you have
> performance metrics you can share? Did you tune it for a particular
> filesystem?

Not really.  I have not stored the results of any tests to date.  The 
largest test, however, created a million item cache, and I had no 
problems.

The main consideration if one were planning on storing large numbers of 
elements is making sure that the filesystem is built with large numbers of 
small files taken into consideration.

However, a lot of files can fit into a surprisingly non-messy directory 
structure.

Consider an SHA512 hash for a key that has the first few characters of:

dd232b224979c0e

If I am running a cache with a bucket depth of 2, and a bucket width of 2, 
the file for that cache is going to go into the directory

dd/23/

Given an even distribution of hashes, with a million records, one should 
have 256 directories at the top level, 256 at the second level, and 15 or 
16 files under each of those.

That's pretty easy to handle efficiently.


Kirk Haines