On Dec 7, 3:29 am, Jano Svitok <jan.svi... / gmail.com> wrote:
> I'd *assume* the former saves you a bunch of allocations when looping
> through a file
> (I assume the buffer is reused instead of allocating a new one for
> each iteration).

I'm not the smartest C programmer (or the smartest anything
programmer), but I'm not seeing any optimization in the actual C code.
Please correct me if I'm wrong.

First, io_read() is the function called in the backend from IO#read.
Te relevant lines are:

====
    rb_scan_args(argc, argv, "02", &length, &str);

    if (NIL_P(length)) {
	if (!NIL_P(str)) StringValue(str);
	GetOpenFile(io, fptr);
	rb_io_check_readable(fptr);
	return read_all(fptr, remain_size(fptr), str);
    }
    len = NUM2LONG(length);
    if (len < 0) {
	rb_raise(rb_eArgError, "negative length %ld given", len);
    }

    if (NIL_P(str)) {
	str = rb_tainted_str_new(0, len);
    }
    else {
	StringValue(str);
	rb_str_modify(str);
	rb_str_resize(str,len);
    }
====

So we see that we get a new string from rb_tainted_str_new if buffer
is is not passed in to IO#read; otherwise str is used and we call
StringValue on it.

So what is StringValue? A macro defined in ruby.h:

====
#define StringValue(v) rb_string_value(&(v))
====

And what is rb_string_value()? A function from string.c:

====
static char *null_str = "";

VALUE
rb_string_value(ptr)
    volatile VALUE *ptr;
{
    VALUE s = *ptr;
    if (TYPE(s) != T_STRING) {
	s = rb_str_to_str(s);
	*ptr = s;
    }
    if (!RSTRING(s)->ptr) {
	FL_SET(s, ELTS_SHARED);
	RSTRING(s)->ptr = null_str;
    }
    return s;
}
====

So if it's not a string, we convert it to one, otherwise we zero it
out.

But the interesting lines are back up in io_read():

====
	rb_str_modify(str);
	rb_str_resize(str,len);
====

Now rb_str_modify() (string.c) is called with our zeroed string. And
it in turn calls str_make_independent():

====
static void
str_make_independent(str)
    VALUE str;
{
    char *ptr;

    ptr = ALLOC_N(char, RSTRING(str)->len+1);
    if (RSTRING(str)->ptr) {
	memcpy(ptr, RSTRING(str)->ptr, RSTRING(str)->len);
    }
    ptr[RSTRING(str)->len] = 0;
    RSTRING(str)->ptr = ptr;
    RSTRING(str)->aux.capa = RSTRING(str)->len;
    FL_UNSET(str, STR_NOCAPA);
}
====

And finally, rb_str_resize is called:

====
VALUE
rb_str_resize(str, len)
    VALUE str;
    long len;
{
    if (len < 0) {
	rb_raise(rb_eArgError, "negative string size (or size too big)");
    }

    rb_str_modify(str);
    if (len != RSTRING(str)->len) {
	if (RSTRING(str)->len < len || RSTRING(str)->len - len > 1024) {
	    REALLOC_N(RSTRING(str)->ptr, char, len+1);
	    if (!FL_TEST(str, STR_NOCAPA)) {
		RSTRING(str)->aux.capa = len;
	    }
	}
	RSTRING(str)->len = len;
	RSTRING(str)->ptr[len] = '\0';	/* sentinel */
    }
    return str;
}
====

Now, like I said, I'm not the greatest C programmer...but I fail to
see how, if I'm reading the code above correctly, passing in a buffer
string to IO#read is any more optimal than creating a new string (even
when looping many times), since it appears to me to be doing the same
thing (compare str_new from string.c, which is what rb_tainted_str_new
calls).

Regards,
Jordan

----
References:

http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_8/io.c
http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_8/ruby.h
http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_8/string.c