On Sun, 28 Oct 2007 16:29:47 +0900, 7stud -- wrote: > Konrad Meyer wrote: >> Quoth 7stud --: >>> #create a data file containing: >>> >>> else >>> break >>> end >>> >>> end >>> end >> >> IO#each_with_index and IO#readline are probably the same internally, so >> the >> real answer here is that NO, IO#readline is NOT the same as >> File.read.split('\n'), that's IO#readlines. >> >> > The real question is: does readline do any buffering? It must. There's no POSIX call that can read until the end of a line, so you have to read(2) a bunch of data, look for a newline, and if there's no newline in it you have to read more. If there is a newline in it, then you have to buffer everything you read that comes after the newline. That's life with POSIX. The standard C library has fgets(3) which can find a newline, butit probably does its own buffering internally, for the same reasons that other POSIX apps would. Ruby uses fread(3), the C library's equivalent of read(2), so ruby has to do its own buffering. > What about > each()? If a file has ten lines in it, does ruby access the file ten > times? Or, does ruby read some reasonable amount of data into a buffer? rb_io_each_line implements IO#each_line and IO#each. It boils down to a loop: while (!NIL_P(str = rb_io_getline(rs, io))) { rb_yield(str); } and rb_io_getline reads only as much as it feels is necessary to find that newline. It doesn't put the whole file in memory at once. --Ken -- Ken Bloom. PhD candidate. Linguistic Cognition Laboratory. Department of Computer Science. Illinois Institute of Technology. http://www.iit.edu/~kbloom1/