On Sun, 28 Oct 2007 16:29:47 +0900, 7stud -- wrote:

> Konrad Meyer wrote:
>> Quoth 7stud --:
>>> #create a data file containing:
>>> 
>>>     else
>>>       break
>>>     end
>>> 
>>>   end
>>> end
>> 
>> IO#each_with_index and IO#readline are probably the same internally, so
>> the
>> real answer here is that NO, IO#readline is NOT the same as
>> File.read.split('\n'), that's IO#readlines.
>>
>>
> The real question is: does readline do any buffering?

It must. There's no POSIX call that can read until the end of a line, so 
you have to read(2) a bunch of data, look for a newline, and if there's 
no newline in it you have to read more. If there is a newline in it, then 
you have to buffer everything you read that comes after the newline. 
That's life with POSIX.

The standard C library has fgets(3) which can find a newline, butit 
probably does its own buffering internally, for the same reasons that 
other POSIX apps would.

Ruby uses fread(3), the C library's equivalent of read(2), so ruby has to 
do its own buffering.

> What about
> each()?  If a file has ten lines in it, does ruby access the file ten
> times?  Or, does ruby read some reasonable amount of data into a buffer?

rb_io_each_line implements IO#each_line and IO#each. It boils down to a 
loop:

    while (!NIL_P(str = rb_io_getline(rs, io))) {
        rb_yield(str);
    }

and rb_io_getline reads only as much as it feels is necessary to find 
that newline. It doesn't put the whole file in memory at once.

--Ken


-- 
Ken Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/