On 8/12/06, Bill Kelly <billk / cts.com> wrote: > Hi, > > What is being measured? Access time for files that already exist? > Creation of new files? Scanning the directory structure for a list > of existing files? > > At a prior gig, we used to split a couple hundred thousand > encyclopedia articles up as 12/34/56.xxx sort of format. It worked > adequately for our needs--our batch-oriented processing was expected > to run overnight anyway--but my impression was that as long as the > filename was known, accessing file 12/34/56.xxx seemed quick, > whereas directory scans to enumerate the existing filenames were > pretty slow. > I wanted to put a rough lower bound on the performance of the approach, just to decide if it's worth pursuing at all. (Kirk obviously thinks it is, but I don't know if the class of applications that interests him is anything like the class that interests me.) So I measured creation of 1k files and re-writing of 1k files. (The prior case involves creating an inode and data pages, the second involves touching an inode and creating data pages.) I didn't test reads of the files (which would involve touching an inode) because I didn't feel like optimizing the test (by remounting my filesystem with the inode-touch for last-access turned off). The test box was a medium-powered Linux workstation with a 2.6 kernel, a single SATA drive, and ext3 filesystems. I'd expect under normal conditions to get maybe 15 megabytes per second of disk-write bandwidth from this system, although with tuning I could probably get a lot more. But again, I was going for a smell test here. This whole approach is attractive because it's easy to code, so I'd want to use it for light-duty applications on non-tuned hardware. For an application with more stringent requirements, I'd make a different tradeoff in development time and probably use a different approach. Anyway, I first tried it on /dev/shm. That worked really nice for 10,000 files, took about 0.9 seconds consistently to create new files and 0.6 seconds to re-write them. The same test with 100,000 files totally de-stabilized the machine. I didn't want to reboot it so I waited. Fifteen minutes later it was back. But what a strange journey that must have been. Obviously this approach doesn't make a lot of sense on a shm device anyway, but I had to know. With a disk-based filesystem, things were a lot better. For 10,000 files, about 1.6 seconds to create them and 1.2 to rewrite them. Those numbers were consistent across many trials. Similar results at 30,000 and 60,000 files, just scale upward. At 100,000 files things got screwy. The create-time got variable, ranging from 5 seconds to almost 15 seconds from run to run. During all the runs, the machine didn't appear to destabilize and remained responsive. Obviously processor loads were very low. I didn't make a thorough study of page faults and swapping activity though. But notice that the implied throughput is an interestingly high fraction of my notional channel bandwidth of 15 megabytes/sec. And the journalling FS means that I don't even think about the movement of the R/W head inside the disk drive anymore. (Of course that may matter a great deal on Windows, but if I'm using Windows then *everything* about the project costs vastly more anyway, so who cares?) So I'm claiming without further study (and without trying to explain the results) that the lower bound on performance is in the region of 5000 writes a second. That's just at the fuzzy edge of being worth doing. I usually think in terms of a "budget" for any operation that must be repeated for a continuously-running server application. In general, I want to be able to do a bare minimum of 1000 "useful things" per second on a sustained basis, on untuned hardware. ("Useful things" might be dynamic web-pages generated, or guaranteed-delivery messages processed, etc.) So this approach uses nearly 20% of my budget. It's a big number. (Just to show how I apply this kind of analysis: I never worry about adding a SHA-1 hash calculation to any critical code path, because I know I can do 250,000 of those per second without breaking a sweat.)