Clifford Heath <cjh_nospam / managesoft.com> writes:

> Note that Linux "cat" doesn't move the data twice. Instead it mmap's

That's strange. My GNU cat does not use mmap. It uses read() and
write().

> Either way, ruby can't do this, but it can use fread! What about it Matz?

> $ time ./cat_read < /tmp/ten_megabytes  > /dev/null
> real    0m0.085s
> user    0m0.000s
> sys     0m0.080s
> $ time ./cat_fread < /tmp/ten_megabytes  > /dev/null
> real    0m0.086s
> user    0m0.000s
> sys     0m0.080s

In the above benchmark, fread(3) and read(2) does not differ by
much. But, at least in glibc 2.2.4, fread(3) is merely a portability
layer on top of read(2). So, theoretically, using read(2) should
result in faster performance than fread(3).

On another note, mmap cannot be used as a generic reading
mechanism. It requires an fd. Accessing $stdin will be done
differently than accessing other IO objects. Too much hassle, and for
the case of cat-ing, there won't be any improvement since the file
access is linear, not random. In fact, hunting down mmap() in
filemap.c (which gave me a headache) from linux 2.4.18 code makes me
think that for strictly linear access, mmap will suffer because of the
overhead.

> Since fread is almost as fast as read, the restriction on not mixing
> sysread and read could perhaps be relaxed too?

You're not the only one confused about existence of #sysread and
#read, I am too. Both rb_io_read and rb_io_sysread do basically the
same thing. Only diff is one uses getc(3), and the other one
read(2). Since they are not on the same layer, calling one after the
other one confuses the system. Simply changing #sysread to use
fread(3) will eliminate the confusion and the price is a very small
overhead. But Matz didn't do it.

Is there anything that can be done with read(2) but can't be done with
fread(3)? If not, then the only reason I can think of is #sysread is
there for you to utilise the maximum capability of the OS. Could this
be true?

YS.