Yohanes Santoso wrote:
> IO#read reads a character at a time using getc (man 3 getc), that's
> why it's so slow.

Sigh. Why do they *always* do it the haaard way... :-)

Seriously though, a simple fread() call would improve things enormously:

$ time /bin/cat < /tmp/ten_megabytes  > /dev/null
real    0m0.035s
user    0m0.000s
sys     0m0.030s
$ time ./cat_read < /tmp/ten_megabytes  > /dev/null
real    0m0.085s
user    0m0.000s
sys     0m0.080s
$ time ./cat_fread < /tmp/ten_megabytes  > /dev/null
real    0m0.086s
user    0m0.000s
sys     0m0.080s
$ time ./cat_getchar < /tmp/ten_megabytes  > /dev/null
real    0m2.017s
user    0m1.980s
sys     0m0.030s
$ time ruby cat_sysread.rb /tmp/ten_megabytes > /dev/null
real    0m0.154s
user    0m0.060s
sys     0m0.090s
$ time ruby cat_read.rb /tmp/ten_megabytes > /dev/null
real    0m1.294s
user    0m1.270s
sys     0m0.030s

Note that Linux "cat" doesn't move the data twice. Instead it mmap's
the file and writes that, which apparently in this case *does* actually
transfer the data at least once - which it shouldn't need to... I would
have thought it would mmap pages set to fault on read, so that untouched
pages never get read.

Either way, ruby can't do this, but it can use fread! What about it Matz?
Since fread is almost as fast as read, the restriction on not mixing
sysread and read could perhaps be relaxed too?

--
Clifford Heath