On Thursday, April 17, 2003, at 06:45 AM, Jim Freeze wrote: > On Thursday, 17 April 2003 at 19:29:16 +0900, David King Landrith > wrote: > >> In my experience, the fastest way to access files (by far) is mmap. > > [mmap stuff deleted] > > Thanks much. I'll look into this. > I have already gone from a 18% loss to a 15.5% lead > over perl by switching to rb_io_gets. The mmap should be dramatically faster than rb_io_gets for a few reasons. For reasonably sized files, mmap will be as fast as rolling through a string. For longer files, the information will get paged in and out of memory as fast as the operating system will allow. So if you're writing your code in C, the bottleneck is likely to be either in the kernel or in the OS, not your application. By the way, you can also use mmap to write. So that you read from one mmap and write to another. mmap has its disadvantages, too. For example, you lose a lot of granular control over how memory is allocated. > Do you think that mmap will get me speeds near cp? I've never looked at the source for gnu fileutils, so I don't know how cp works. [BEGIN SPECULATION] The fact that much of it is a straight bit copy of binary data may allow for optimizations that the more general approach we're using, in which our reading of the data allows pretty much any use of it. So cp may well remain somewhat faster. [END SPECULATION] Perhaps someone else on the list can speak on this topic with more authority. I would, however, be very surprised if it were dramatically faster than mmap commands. Best, Dave ------------------------------------------------------- David King Landrith (w) 617.227.4469x213 (h) 617.696.7133 One useless man is a disgrace, two are called a law firm, and three or more become a congress -- John Adams ------------------------------------------------------- public key available upon request