Issue #13597 has been updated by emilys (Emily Stolfo).


Hi Eric

Thank you so much for your response - it provided a lot of useful information I didn't know otherwise. I've pointed the user who opened the pull request to your response so he has the chance to update his code based on the new information. 

I haven't heard back from him yet but in the meantime, I'll do some testing and see what I find to be the optimal solution. I'll certainly ping you again if I have questions...and will also look forward to perhaps having the ability to pass an offset to read_noblock in the future.

Thanks again
Emily


normalperson (Eric Wong) wrote:
> emily / mongodb.com wrote:
>  > Hello
>  > 
>  > I've observed that a lot of memory gets allocated and wasted
>  > when read_nonblock is called for a number of bytes much larger
>  > than is actually read from the socket.  This line
>  > https://github.com/ruby/ruby/blob/0130bb01baed404c0e3c75bd5db472415a6da1d3/io.c#L2686
>  > appears to eventually only change the heap size value here
>  > https://github.com/ruby/ruby/blob/144e06700705a3f067582682567bc77b429c4fca/string.c#L104
>  > but does not call remalloc.
>  
>  Correct.  We do not realloc here since there is a good chance
>  the buffer can be reused soon after and need the larger size.
>  realloc can be very expensive.
>  
>  > I see this request to allow an offset to be passed to read_nonblock:
>  > https://bugs.ruby-lang.org/issues/11484
>  
>  Thanks for pinging on that, I guess I'll try implementing it at
>  some point (but I will need matz approval to make API changes).
>  
>  > but until that is implemented, how do you recommend
>  > efficiently asking to read a large number of bytes from a
>  > socket? If I'm not mistaken, if I request 16000000, but only
>  > read 1000000, the buffer that has been allocated in
>  > io_read_nonblock for 16000000 doesn't seem to be resized.
>  
>  You can use String#clear right away on the result:
>  
>  rbuf = ''
>  	tmp = ''
>  	case ret = io.read_nonblock(16384, tmp, exception: false)
>  	when String
>   # tmp.object_id == ret.object_id at this point
>  rbuf << ret
>  ret.clear # calls free(3) internally
>  else
>  ...
>  end while true
>  
>  And you can also clear the bigger rbuf when you're done.
>  
>  Coincidentally, I made a similar change to net/protocol for
>  net/http in the stdlib this weekend:
>  
>  https://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=revision&revision=58840
>  
>  But of course, I expect a destination offset [Feature #11484]
>  to be more helpful.
>  
>  > Would you recommend instead requesting a more predictable
>  > number of bytes, closer to the default system value
>  > (SO_RCVBUF, for example) in each call to read_nonblock?
>  
>  That might be too complicated and a waste of syscalls in the
>  general case.  I'm not sure I saw value in going with sizes
>  larger than 1MB, and usually 16K is fine.  Using giant values
>  like 16MB will blow away your CPU cache.  Maybe, (just maybe)
>  16MB helps with really big transfers across LFNs
>  (long-fat-networks), but I doubt that's a a common case for DBs
>  :)
>  
>  > For context, this pull request against the MongoDB Ruby driver has lead me to this investigation. https://github.com/mongodb/mongo-ruby-driver/pull/864
>  
>  I don't agree with GitHub's Terms-of-Service nor do I run
>  Javascript or look at images; but I dumped that text and read
>  it; so I'll add some notes here:
>  
>  In my experience, 4K is too small for even 70ms latency
>  connections,  but that might've just been on the writing
>  side...  I would choose 8K, at least, but usually 16K.  It
>  also depends on network latency and hardware.
>  
>  Choosing 16K also has a good side effect with current CRuby: a
>  malloc implementation can internally reuse space which Ruby
>  uses internally for buffers; potentially reducing
>  fragmentation and helping cache latency.  And we (CRuby) have
>  been using 16K for most IO buffers for a long time...
>  
>  Anyways, I'll be glad to help with further network-related
>  Ruby stuff on here as long as everything is plain text.



----------------------------------------
Misc #13597: Does read_nonblock call remalloc for the buffer if does it just set the size attribute
https://bugs.ruby-lang.org/issues/13597#change-65153

* Author: emilys (Emily Stolfo)
* Status: Open
* Priority: Normal
* Assignee: 
----------------------------------------
Hello

I've observed that a lot of memory gets allocated and wasted when read_nonblock is called for a number of bytes much larger than is actually read from the socket.
This line https://github.com/ruby/ruby/blob/0130bb01baed404c0e3c75bd5db472415a6da1d3/io.c#L2686 appears to eventually only change the heap size value here https://github.com/ruby/ruby/blob/144e06700705a3f067582682567bc77b429c4fca/string.c#L104 but does not call remalloc.

I see this request to allow an offset to be passed to read_nonblock:
https://bugs.ruby-lang.org/issues/11484

but until that is implemented, how do you recommend efficiently asking to read a large number of bytes from a socket? If I'm not mistaken, if I request 16000000, but only read 1000000, the buffer that has been allocated in io_read_nonblock for 16000000 doesn't seem to be resized.

Would you recommend instead requesting a more predictable number of bytes, closer to the default system value (SO_RCVBUF, for example) in each call to read_nonblock?

For context, this pull request against the MongoDB Ruby driver has lead me to this investigation. https://github.com/mongodb/mongo-ruby-driver/pull/864

Thank you in advance
Emily





-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>