On Sat, 12 Aug 2006, Francis Cianfrocca wrote: > Kirk, how did you do this? Are you storing immutable objects and > turning older versions into garbage somehow? If you have no indexing, > then how do you enforce a consistent view across processes without > incurring disk i/o (at least having to read the inode)? Or are you > managing an arena that is backed by shared memory and you convert the > key values deterministically into offsets? > > Sorry for all the questions, but I've just had to fight with this > problem too many times and it would be great if you've come up with a > better mousetrap. I don't think it is necessarily a better mousetrap. In this case, I'm just willing to make the filesystem do some of the work for me instead of keeping a separate index in memory. I'm using a modified version of Ara Howard's lockfile.rb to handle locking between processes. It works over NFS, and I have modified it so that it also works on Windows by automatically falling back to a Windows safe locking method. I create a hash (Digest::SHA512 based) out of the key, and use that key as the filename for the data. There is also a second file written that contains some metadata (the structure on disk is actually a linked list to more easily support some of the LRU aspects and element expiration). There are a couple of metadata files that also maintain some information on the data store as a whole. The code automatically breaks the file storage into a tree of directories to help avoid any filesystem problems with having too many files in a single directory. The size of this tree is tuneable so that a data store that may have a million entries may have more nodes (directories) than one that is only expected to have 10000. So, retrieving an element from the data store is reduced to hashing the key, finding the file, and reading it. It's disk i/o, but less than PStore generates most of the time. Since my need was for something that could do this while also being treated as an LRU cache, there is some extra work mixed into there to maintain the meta data and expire elements from the cache (it can also expire them based on age), and that's really where most of the performance hit comes from. I'm working on creating a second version that strips out all of the LRU cache code to provide a simple, fast data persistence implementation without any of that extra overhead, and will be better able to report on just how much of a performance hit that overhead brings with it soon. Thanks, Kirk Haines