["Jan Svitok" <jan.svitok / gmail.com>]
| 
| Hi,
| there seems to be HTTPResponse#read_body, that can provide the chunks
| as they come (not tested, copy&paste from docs:
| 
|  # using iterator
|   http.request_get('/index.html') {|res|
|     res.read_body do |segment|
|       print segment
|     end
|   }

thanks!

indeed, this helped a bit, but not too much.  from the looks of it the
standard libraries seem to hard-code the read buffer size to 1024
(Ruby 1.8, net/protocol.rb) which results in at least twice the number
of system calls to read(2) for the same amount of data.  I
experimentally upped the read buffer to 10k, and now it seems I get
buffer-fulls equivalent to the MTU over the interface the data is
readfrom.  

when it is at 1024 bytes I consistently get one buffer of 1024 and the
next buffer is approximately MTU - 1024.  the next is 1024 bytes again
etc.

even after modifying the hard-coded buffer size to 10k it still eats
obscene amounts of CPU for what it is doing.  I would have expected
any reasonable implementation to eat at most 1% CPU (probably less)
for what is almost pure IO. (it now consumes about 35% CPU on a 2Ghz
AMD).

anyway, note to implementors: it might be an idea to pick a buffer
size larger than 1024 bytes if you are going to hard code it.  at the
very least 4k or 8k would be more sensible.  preferably it should be
configurable (but with a sensible default value) so the user can make
an informed decision to increase or decrease the size as needed.

-Bj?rn