On Dec 7, 4:56 am, MonkeeSage <MonkeeS... / gmail.com> wrote:
> On Dec 7, 3:29 am, Jano Svitok <jan.svi... / gmail.com> wrote:
>
> > I'd *assume* the former saves you a bunch of allocations when looping
> > through a file
> > (I assume the buffer is reused instead of allocating a new one for
> > each iteration).
>
> I'm not the smartest C programmer (or the smartest anything
> programmer), but I'm not seeing any optimization in the actual C code.
> Please correct me if I'm wrong.
>
> First, io_read() is the function called in the backend from IO#read.
> Te relevant lines are:
>
> ====
>     rb_scan_args(argc, argv, "02", &length, &str);
>
>     if (NIL_P(length)) {
>         if (!NIL_P(str)) StringValue(str);
>         GetOpenFile(io, fptr);
>         rb_io_check_readable(fptr);
>         return read_all(fptr, remain_size(fptr), str);
>     }
>     len = NUM2LONG(length);
>     if (len < 0) {
>         rb_raise(rb_eArgError, "negative length %ld given", len);
>     }
>
>     if (NIL_P(str)) {
>         str = rb_tainted_str_new(0, len);
>     }
>     else {
>         StringValue(str);
>         rb_str_modify(str);
>         rb_str_resize(str,len);
>     }
> ====
>
> So we see that we get a new string from rb_tainted_str_new if buffer
> is is not passed in to IO#read; otherwise str is used and we call
> StringValue on it.
>
> So what is StringValue? A macro defined in ruby.h:
>
> ====
> #define StringValue(v) rb_string_value(&(v))
> ====
>
> And what is rb_string_value()? A function from string.c:
>
> ====
> static char *null_str = "";
>
> VALUE
> rb_string_value(ptr)
>     volatile VALUE *ptr;
> {
>     VALUE s = *ptr;
>     if (TYPE(s) != T_STRING) {
>         s = rb_str_to_str(s);
>         *ptr = s;
>     }
>     if (!RSTRING(s)->ptr) {
>         FL_SET(s, ELTS_SHARED);
>         RSTRING(s)->ptr = null_str;
>     }
>     return s;}
>
> ====
>
> So if it's not a string, we convert it to one, otherwise we zero it
> out.
>
> But the interesting lines are back up in io_read():
>
> ====
>         rb_str_modify(str);
>         rb_str_resize(str,len);
> ====
>
> Now rb_str_modify() (string.c) is called with our zeroed string. And
> it in turn calls str_make_independent():
>
> ====
> static void
> str_make_independent(str)
>     VALUE str;
> {
>     char *ptr;
>
>     ptr = ALLOC_N(char, RSTRING(str)->len+1);
>     if (RSTRING(str)->ptr) {
>         memcpy(ptr, RSTRING(str)->ptr, RSTRING(str)->len);
>     }
>     ptr[RSTRING(str)->len] = 0;
>     RSTRING(str)->ptr = ptr;
>     RSTRING(str)->aux.capa = RSTRING(str)->len;
>     FL_UNSET(str, STR_NOCAPA);}
>
> ====
>
> And finally, rb_str_resize is called:
>
> ====
> VALUE
> rb_str_resize(str, len)
>     VALUE str;
>     long len;
> {
>     if (len < 0) {
>         rb_raise(rb_eArgError, "negative string size (or size too big)");
>     }
>
>     rb_str_modify(str);
>     if (len != RSTRING(str)->len) {
>         if (RSTRING(str)->len < len || RSTRING(str)->len - len > 1024) {
>             REALLOC_N(RSTRING(str)->ptr, char, len+1);
>             if (!FL_TEST(str, STR_NOCAPA)) {
>                 RSTRING(str)->aux.capa = len;
>             }
>         }
>         RSTRING(str)->len = len;
>         RSTRING(str)->ptr[len] = '\0';       /* sentinel */
>     }
>     return str;}
>
> ====
>
> Now, like I said, I'm not the greatest C programmer...but I fail to
> see how, if I'm reading the code above correctly, passing in a buffer
> string to IO#read is any more optimal than creating a new string (even
> when looping many times), since it appears to me to be doing the same
> thing (compare str_new from string.c, which is what rb_tainted_str_new
> calls).
>
> Regards,
> Jordan
>
> ----
> References:
>
> http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_8/io.chttp://svn.ruby-lang.org/repos/ruby/branches/ruby_1_8/ruby.hhttp://svn.ruby-lang.org/repos/ruby/branches/ruby_1_8/string.c

Oh...wait...I'm completely dense. Duh! io_read() is going to create /
re-initialize new string anyway to put its results in. So If I create
a new string independently to store the return value of IO#read, then
I'm causing an extra allocation and copy. Sorry for wasting space.
Have pity on mentally handicapped people like me. :P

Regards,
Jordan