James Edward Gray II wrote:
> On Oct 23, 2006, at 4:36 AM, khaines / enigo.com wrote:
> 
>> In my case I create a hash of the storage key and use that as part of 
>> the filename, and then I had the code make subdirectories based on the 
>> starting letters of the hash in order to avoid having too many files 
>> in any one directory.  My default was to take a chunk of two 
>> characters from the beginning of the hash as subdirectory name, twice.
>>
>> So a hash starting with 'abd45cc' would have it's file in ab/d4/
> 
> This sounds like it would make a neat open source project.  Any plans to 
> share the code?
> 
> James Edward Gray II

I'm guessing it is something like this:

--- flat.rb ---
# Fast, flat storage based on Kirk Haines' technique.

require 'fsdb'
require 'digest/md5'

class FlatDB < FSDB::Database

   def initialize(path, depth = 2)
     raise ArgumentError, "Invalid depth #{depth} > 32" if depth > 32

     @path_from_key = {}
     @path_pat = Regexp.new("^" + "(..)"*depth)
     @depth = depth

     super(path)
   end

   def path_from_key(key)
     path = @path_from_key[key]
     unless path
       if @depth > 0
         path_components = Digest::MD5.hexdigest(key).scan(@path_pat).first
         path_components << key
         path = path_components.join("/")
       else
         path = key
       end
       @path_from_key[key] = path
     end
     path
   end

   def browse(key)
     super path_from_key(key)
   end

   def edit(key)
     super path_from_key(key)
   end

   def replace(key)
     super path_from_key(key)
   end

   def delete(key, load=true)
     @path_from_key.delete key
       ## should probably purge this hash periodically
     super path_from_key(key), load
   end

   # don't bother with #insert, #fetch, #[], #[]= since they are
   # inherently less efficient
end

db = FlatDB.new('/tmp/fsdb/flat', 2)

db.replace 'foo.txt' do
   "this is the foo text"
end

db.browse 'foo.txt' do |x|
   p x
end

# key names can have '/' in them, in which case they reference deeper 
subdirs
db.replace 'sub/foo.txt' do
   "this is the subdir's foo text"
end

db.browse 'sub/foo.txt' do |x|
   p x
end

require 'benchmark'

Benchmark.bm(10) do |bm|
   nfiles = 100

   [0,1,2].each do |depth|
     db = FlatDB.new("/tmp/fsdb/depth_#{depth}", depth)

     puts "\ndepth=#{depth}"

     bm.report "create" do
       nfiles.times do |i|
         db.replace i.to_s do
           i  # this will be marshaled
         end
       end
     end

     bm.report "access" do
       nfiles.times do |i|
         db.browse i.to_s do |j|
           raise unless i == j
         end
       end
     end
   end
end

__END__

results for nfiles=100_000, linux 2.6.15, ext3 with no special params, 
1.7GHz centrino, 40Gb Fujitsu laptop drive, running at nice -n -20:

"this is the foo text"
"this is the subdir's foo text"
                 user     system      total        real

depth=0
create     72.680000 1772.030000 1844.710000 (1853.686824)
access     55.780000  13.090000  68.870000 ( 97.170382)

depth=1
create    125.170000  24.250000 149.420000 (329.576419)
access    143.210000  12.040000 155.250000 (759.768371)

depth=2
create    263.900000  32.570000 296.470000 (1950.482468)
access    195.200000  17.250000 212.450000 (1562.214063)

# du -sk depth_0
804236  depth_0
# du -sk depth_1
804832  depth_1
# du -sk depth_2
1006408 depth_2

-- 
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407