On Thu, Apr 17, 2003 at 10:54:08PM +0900, Jim Freeze wrote:
> > The problems are:
> > - you want to process the file a *line* at a time
> > - you are allocating a new String object for each line
> > - you are calling 'yield' on a Ruby block for each line
> 
> For a 260MB file, ruby spent 30% of its time in io and
> 70% processing the lines. (In this test case we were
> doing minimal processing.)
> 
> So, of that 30% (which was 80 seconds), what would a fair 
> estimate be of time saved if we swapped mmap for rb_io_gets?

I don't think that's a meaningful question, because mmap() doesn't do the
same thing as rb_io_gets.

I think what you are asking is about rewriting rb_io_gets to do the same
things it currently does (i.e. divide the input into lines, allocate a
string for each one), but using mmap() instead of fread() to access the
file.

Did your program actually spend 30% of its time in fread()? Or was it 30% of
its time in rb_io_gets? It's an important distinction.

As a guess, I'd say your savings would be minimal. Should you decide to mmap
the whole 260MB into your address space at once you may actually get worse
overall performance, unless you have that amount of free RAM available.

Regards,

Brian.