Issue #6492 has been updated by drbrain (Eric Hodel).


naruse (Yui NARUSE) wrote:
> drbrain (Eric Hodel) wrote:
> > Due to read_chunked, and persistent connections I don't see how to make this work.
> 
> Yeah, I thought adding an another layer, transport encoding decoder, but it is just an idea and I don't suggest it.

I had this idea too, but it would be a larger change. I hope we can create something simpler, but also usable.

> > > This read method return a string whose length is not clen, this is wrong.
> > > Other IO-like object for example Zlib::GzipReader returns a string whose length is clen.
> > > So Inflater should have a internal buffer and return the string whose length is just clen.
> > 
> > Upon review, I think this is OK.
> > 
> > RFC 2616 specifies that Content-Length and Content-Range (which are used for clen) refer to the transferred bytes and are used to read the correct amount of data from the response to maintain the persistent connection.  Net::HTTPResponse#read_body doesn't allow the user to specify the amount of bytes they wish to read, so returning more data to the user is OK.
> 
> Your patch hides content-encoding layer.
> Content-Length and Content-Range are the member of the layer.
> 
> Net:HTTPRequest#read is on the layer.

I'm confused.

There is Net::BufferedIO#read, but no Net::HTTPResponse#read.  There is Net::HTTPResponse#read_body which lets you read the entire body or chunks of unknown size.

I don't see a way for the user to say "read 10 bytes of the response body" without manually buffering:

  require 'net/http'
  
  req = Net::HTTP::Get.new '/'
  
  body_part = Net::HTTP.start 'localhost' do |http|
    buffer = ''
    target_size = 10
  
    http.request req do |res|
      res.read_body do |chunk|
        break if buffer.length == target_size
  
        buffer << chunk[0, target_size - buffer.length]
      end
    end
  
    buffer
  end
  
  p body_part

Since Net::HTTPResponse is not usable as an IO I don't think IO-like behavior should apply to Net::HTTPResponse::Inflater.

> A user of net/http can't know whether a request used content-encoding or not.

I am unsure what you mean by "can't".

Do you mean "a user of net/http must be able to tell content-encoding was present"?

When this patch is combined with #6494 they will not be able to know whether a request used content-encoding or not.  I think this is good, the user should not have to worry about how the bytes were transported.  (This behavior matches Net::HTTP#get).

If we want the user to be able to handle content-encoding themselves I think adding a Net::HTTP#compression = false (which will disable both #6492 and #6494) would be best.  We can also add an indicator on Net::HTTPResponse that decompression was performed.

For Content-Length with Content-Encoding, the Content-Length will be invalid.  I think this is OK because RFC 2616 doesn't contain an indicator of the decoded length and the user is most likely interested in the decoded body.

Content-Range with Content-Encoding requires special handling.  The compressed stream may start anywhere in the underlying block.  (For a deflate-based stream the user would need to manually reconstruct the complete response in order to inflate it.)  I think such users should disable compression.

> On such situation, it can't be a reason why hidden Content-Encoding layer effects the behavior of read method.

I agree that in RFC 2616 that Content-Encoding, Content-Length and Content-Range are all on the same layer, but without an IO-like interface for the Net::HTTPResponse body I don't think a restriction on the behavior of the read method should apply.  Since this API is entirely internal, I think it is OK if a future addition to the API needs to add buffering to be IO-like.
----------------------------------------
Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip responses by default
https://bugs.ruby-lang.org/issues/6492#change-27086

Author: drbrain (Eric Hodel)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: lib
Target version: 2.0.0


=begin
This patch moves the compression-handling code from Net::HTTP#get to Net::HTTPResponse to allow decompression to occur by default on any response body.  (A future patch will set the Accept-Encoding on all requests that allow response bodies by default.)

Instead of having separate decompression code for deflate and gzip-encoded responses, (({Zlib::Inflate.new(32 + Zlib::MAX_WBITS)})) is used which automatically detects and inflated gzip-wrapped streams which allows for simpler processing of gzip bodies (no need to create a StringIO).
=end



-- 
http://bugs.ruby-lang.org/