On Thursday, April 17, 2003, at 06:45 AM, Jim Freeze wrote:

> On Thursday, 17 April 2003 at 19:29:16 +0900, David King Landrith 
> wrote:
>
>> In my experience, the fastest way to access files (by far) is mmap.
>
> [mmap stuff deleted]
>
> Thanks much. I'll look into this.
> I have already gone from a 18% loss to a 15.5% lead
> over perl by switching to  rb_io_gets.

The mmap should be dramatically faster than rb_io_gets for a few 
reasons.  For reasonably sized files, mmap will be as fast as rolling 
through a string.  For longer files, the information will get paged in 
and out of memory as fast as the operating system will allow.  So if 
you're writing your code in C, the bottleneck is likely to be either in 
the kernel or in the OS, not your application.

By the way, you can also use mmap to write.  So that you read from one 
mmap and write to another.

mmap has its disadvantages, too.  For example, you lose a lot of 
granular control over how memory is allocated.

> Do you think that mmap will get me speeds near cp?

I've never looked at the source for gnu fileutils, so I don't know how 
cp works.  [BEGIN SPECULATION] The fact that much of it is a straight 
bit copy of binary data may allow for optimizations that the more 
general approach we're using, in which our reading of the data allows 
pretty much any use of it.  So cp may well remain somewhat faster.  
[END SPECULATION]  Perhaps someone else on the list can speak on this 
topic with more authority.  I would, however, be very surprised if it 
were dramatically faster than mmap commands.

Best,

Dave

-------------------------------------------------------
David King Landrith
   (w) 617.227.4469x213
   (h) 617.696.7133

One useless man is a disgrace, two
are called a law firm, and three or more
become a congress   -- John Adams
-------------------------------------------------------
public key available upon request