(notes: I posted this to comp.lang.ruby, figuring it would filter 
through to appropriate mailing lists, but it did not).

It seems that net/http's implementation is extremely inefficient when
it comes to dealing with large files.

I think this is something worth fixing in subsequent versions. It
shouldn't be as bad as it is. I would also appreciate any hints or
advice on working around the problem.

Specifically, I am interested in HTTP GETs (from net/http) and HTTP PUTs 
(both on the net/http side and WEBrick receiving side) that have 
adequate streaming performance. I would like to GET and PUT fairly large 
files, and don't want to pay such a large network and CPU performance 
overhead.

Below I have attached a test suite that illustrates the problem. I used
WEBrick as the server.

"Host: localhost, port: 12000, request_uri: /ten-meg.bin"
                user     system      total        real
TCPSocket    0.030000   0.150000   0.180000 (  0.468867)
net/http    10.620000   8.630000  19.250000 ( 21.787785)
LB net/http 10.870000   8.900000  19.770000 ( 22.259448)
open-uri    16.400000  11.900000  28.300000 ( 39.834555)

As you can see, a raw TCPSocket is orders of magnitude faster than
net/http and friends. However, I'm using read_body and receiving the
data in chunks, and I would have expected much better performance as a
result. We're talking 20MB/s for TCPSocket versus 400KB/s for net/http.

What's happening here? What can I do to fix it?

Any help appreciated.

Regards,

Luke.

#!/usr/bin/ruby

require 'net/http'
require 'open-uri'
require 'benchmark'
require 'WEBrick'
include WEBrick

uri = URI.parse("http://localhost:12000/ten-meg.bin")
sourceFolder = "/tmp/"

Kernel.system("dd if=/dev/random of=/tmp/ten-meg.bin bs=1024
count=10240")

port = 12000
server = HTTPServer.new(:Port => port, :DocumentRoot => sourceFolder)
# trap the signal for shutdown
trap("INT"){ server.shutdown }
pid = Kernel.fork {
  $stdout.reopen('/tmp/WEBrick.stdout')
  $stderr.reopen('/tmp/WEBrick.stderr')
  server.start

}

at_exit { Process.kill("INT", pid) }

Kernel.sleep 1

p "Host: #{uri.host}, port: #{uri.port}, request_uri:
#{uri.request_uri}"

Benchmark.bm(10) do |time|
  out = File.new("/tmp/tcp.tar.bz2", "w")
  time.report("TCPSocket") do
    s = TCPSocket.open uri.host, uri.port
    s.write "GET #{uri.request_uri} HTTP/1.0\r\nHost:
#{uri.host}\r\n\r\n"
    temp = s.read.split("\r\n\r\n", 2).last
    s.close
    out.write(temp)
  end
  out.close

  out = File.new("/tmp/net.tar.bz2", "w")
  time.report("net/http") do
    Net::HTTP.start uri.host, uri.port do |http|
      http.request_get(uri.request_uri) do |response|
        response.read_body do |segment|
          out.write(segment)
        end
      end
    end
  end
  out.close

  out = File.new("/tmp/luke.out", "w")
  time.report("LB net/http") do
    http = Net::HTTP.new(uri.host, uri.port)
    http.request_get(uri.path) { |response|
      response.read_body { |segment|
        out.write(segment)
      }
    }
  end
  out.close

  out = File.new("/tmp/uri.tar.bz2", "w")
  time.report("open-uri") do
    uri.open do |x|
      out.write(x.read)
    end
  end
  out.close
end

-- 
Posted via http://www.ruby-forum.com/.