On Dec 7, 4:56 am, MonkeeSage <MonkeeS... / gmail.com> wrote: > On Dec 7, 3:29 am, Jano Svitok <jan.svi... / gmail.com> wrote: > > > I'd *assume* the former saves you a bunch of allocations when looping > > through a file > > (I assume the buffer is reused instead of allocating a new one for > > each iteration). > > I'm not the smartest C programmer (or the smartest anything > programmer), but I'm not seeing any optimization in the actual C code. > Please correct me if I'm wrong. > > First, io_read() is the function called in the backend from IO#read. > Te relevant lines are: > > ==== > rb_scan_args(argc, argv, "02", &length, &str); > > if (NIL_P(length)) { > if (!NIL_P(str)) StringValue(str); > GetOpenFile(io, fptr); > rb_io_check_readable(fptr); > return read_all(fptr, remain_size(fptr), str); > } > len = NUM2LONG(length); > if (len < 0) { > rb_raise(rb_eArgError, "negative length %ld given", len); > } > > if (NIL_P(str)) { > str = rb_tainted_str_new(0, len); > } > else { > StringValue(str); > rb_str_modify(str); > rb_str_resize(str,len); > } > ==== > > So we see that we get a new string from rb_tainted_str_new if buffer > is is not passed in to IO#read; otherwise str is used and we call > StringValue on it. > > So what is StringValue? A macro defined in ruby.h: > > ==== > #define StringValue(v) rb_string_value(&(v)) > ==== > > And what is rb_string_value()? A function from string.c: > > ==== > static char *null_str = ""; > > VALUE > rb_string_value(ptr) > volatile VALUE *ptr; > { > VALUE s = *ptr; > if (TYPE(s) != T_STRING) { > s = rb_str_to_str(s); > *ptr = s; > } > if (!RSTRING(s)->ptr) { > FL_SET(s, ELTS_SHARED); > RSTRING(s)->ptr = null_str; > } > return s;} > > ==== > > So if it's not a string, we convert it to one, otherwise we zero it > out. > > But the interesting lines are back up in io_read(): > > ==== > rb_str_modify(str); > rb_str_resize(str,len); > ==== > > Now rb_str_modify() (string.c) is called with our zeroed string. And > it in turn calls str_make_independent(): > > ==== > static void > str_make_independent(str) > VALUE str; > { > char *ptr; > > ptr = ALLOC_N(char, RSTRING(str)->len+1); > if (RSTRING(str)->ptr) { > memcpy(ptr, RSTRING(str)->ptr, RSTRING(str)->len); > } > ptr[RSTRING(str)->len] = 0; > RSTRING(str)->ptr = ptr; > RSTRING(str)->aux.capa = RSTRING(str)->len; > FL_UNSET(str, STR_NOCAPA);} > > ==== > > And finally, rb_str_resize is called: > > ==== > VALUE > rb_str_resize(str, len) > VALUE str; > long len; > { > if (len < 0) { > rb_raise(rb_eArgError, "negative string size (or size too big)"); > } > > rb_str_modify(str); > if (len != RSTRING(str)->len) { > if (RSTRING(str)->len < len || RSTRING(str)->len - len > 1024) { > REALLOC_N(RSTRING(str)->ptr, char, len+1); > if (!FL_TEST(str, STR_NOCAPA)) { > RSTRING(str)->aux.capa = len; > } > } > RSTRING(str)->len = len; > RSTRING(str)->ptr[len] = '\0'; /* sentinel */ > } > return str;} > > ==== > > Now, like I said, I'm not the greatest C programmer...but I fail to > see how, if I'm reading the code above correctly, passing in a buffer > string to IO#read is any more optimal than creating a new string (even > when looping many times), since it appears to me to be doing the same > thing (compare str_new from string.c, which is what rb_tainted_str_new > calls). > > Regards, > Jordan > > ---- > References: > > http://svn.ruby-lang.org/repos/ruby/branches/ruby_1_8/io.chttp://svn.ruby-lang.org/repos/ruby/branches/ruby_1_8/ruby.hhttp://svn.ruby-lang.org/repos/ruby/branches/ruby_1_8/string.c Oh...wait...I'm completely dense. Duh! io_read() is going to create / re-initialize new string anyway to put its results in. So If I create a new string independently to store the return value of IO#read, then I'm causing an extra allocation and copy. Sorry for wasting space. Have pity on mentally handicapped people like me. :P Regards, Jordan