On Jan 11, 2008 1:06 PM, Mike Fletcher <lemurific+rforum / gmail.com> wrote: > Kyle Schmitt wrote: > > Entries seems to be fairly identical to collect, and it does look > > nicer... > > but yea still slow. > > > > The problem with caching is that we only keep quarantine directories > > around for 10 days, due to their size and the relative rarity of us > > needing to pull something out of it. One reason for writing this as a > > script is that we recover rarely enough that whoever is doing it > > forgot how to recover. Still, it's often enough that we want to be > > able to do it easily. > > If there's a large number of files in these directories that's probably > the source of the slowness, not the method used to get the list of > entries. > > > Many filesystems (some less than others) don't behave as well when you > get a "large" number of files in one directory. I think the rule of > thumb I've used for ext2 filesystems is you'll start to notice a delay > when you get a few hundred entries, and you'll start to feel it when you > have thousands. > > > One way around this (short of installing / upgrading to a new underlying > filesystem that handles these cases better (xfs, for example)) is to > split files out into a directory tree based either on the filename > directly or a hash made from the real filename (say an MD5 hex string of > the filename and you make two levels based on the first 4 hex digits, > 00/00, 00/01, ..., ff/fe, ff/ff; 00/00 contains all files for which the > hashed filename begins "0000...", etc.). The downside of this is that > you either have to walk the entire tree to see the contents, or keep an > external index of the contents (which would eliminate your needing to do > what you're trying to do and the justification for splitting things up, > but . . . :). > > > -- > Posted via http://www.ruby-forum.com/. > > Mike, I've been an advocate of using the right file system for the job for ages now, but the sad matter is, this is running on a rather old version of RedHat, which doesn't support anything real other than ext2 & 3. As for our possible upgrade paths to this box, it would still be RedHat, or a clone (CentOS). From what I can see, they still don't support modern file systems by default. Admittedly I'm tempted to add the support myself (it's not hard), but then it'll bring up the "its a production system" argument here. *sigh* --Kyle