On Jan 11, 2008 1:06 PM, Mike Fletcher <lemurific+rforum / gmail.com> wrote:
> Kyle Schmitt wrote:
> > Entries seems to be fairly identical to collect, and it does look
> > nicer...
> > but yea still slow.
> >
> > The problem with caching is that we only keep quarantine directories
> > around for 10 days, due to their size and the relative rarity of us
> > needing to pull something out of it.  One reason for writing this as a
> > script is that we recover rarely enough that whoever is doing it
> > forgot how to recover.  Still, it's often enough that we want to be
> > able to do it easily.
>
> If there's a large number of files in these directories that's probably
> the source of the slowness, not the method used to get the list of
> entries.
>
>
> Many filesystems (some less than others) don't behave as well when you
> get a "large" number of files in one directory.  I think the rule of
> thumb I've used for ext2 filesystems is you'll start to notice a delay
> when you get a few hundred entries, and you'll start to feel it when you
> have thousands.
>
>
> One way around this (short of installing / upgrading to a new underlying
> filesystem that handles these cases better (xfs, for example)) is to
> split files out into a directory tree based either on the filename
> directly or a hash made from the real filename (say an MD5 hex string of
> the filename and you make two levels based on the first 4 hex digits,
> 00/00, 00/01, ..., ff/fe, ff/ff; 00/00 contains all files for which the
> hashed filename begins "0000...", etc.).  The downside of this is that
> you either have to walk the entire tree to see the contents, or keep an
> external index of the contents (which would eliminate your needing to do
> what you're trying to do and the justification for splitting things up,
> but . . . :).
>
>
> --
> Posted via http://www.ruby-forum.com/.
>
>

Mike,
        I've been an advocate of using the right file system for the
job for ages now, but the sad matter is, this is running on a rather
old version of RedHat, which doesn't support anything real other than
ext2 & 3.  As for our possible upgrade paths to this box, it would
still be RedHat, or a clone (CentOS).  From what I can see, they still
don't support modern file systems by default.  Admittedly I'm tempted
to add the support myself (it's not hard), but then it'll bring up the
"its a production system" argument here.

*sigh*
--Kyle