Issue #2567 has been updated by Fabio Pugliese Ornellas.


Hello,

I'm gonna give my 50 cents:

~~~
class Net::HTTPResponse
  def read_body(dest = nil, &block)
    if @read
      raise IOError, "#{self.class}\#read_body called twice" if dest or block
      return @body
    end
    # Force encoding for streamed response bodies
    final_block = if block
      proc do |chunk|
        if type_params['charset']
          block.call(chunk.force_encoding(type_params['charset']))
        else
          block.call(chunk)
        end
      end
    end
    to = procdest(dest, final_block)
    stream_check
    if @body_exist
      read_body_0 to
      @body = to
    else
      @body = nil
    end
    @read = true
    # Force encoding for String @body
    if type_params['charset'] && @body.respond_to?(:force_encoding)
      @body.force_encoding(response.type_params['charset'])
    end
    @body
  end
end
~~~
These changes:

* Makes Net::HTTP respect https://tools.ietf.org/html/rfc7231#section-3.1.1.2
* It woks for both cases: Net::HTTPResponse.body and Net::HTTPResponse.read_body.
* If there is there is a server misconfiguration, and content-type charset is different from response body, it will postpone encoding exceptions to body processing outside Net::HTTP code, thus making it clearer to the user.
* Users are still allowed to force_encoding to bypass any server misconfiguration.

I understand Ruby libraries must obey RFC's by default, and let users get real exceptions when something is not right. The way it is now, body strings come inconsistent: sometimes I get ASCII-8BIT, sometimes UTF-8, depending on how the code inside Net::HTTP runs, and the RFC is not obeyed.

I believe this change might create problems, with code that "works by coincidence", due to current behavior. For example, if the server is misconfigured, and set charset to iso8859-1, but response body is actually UTF-8, it will currently work, but with proposed patch, it will break. In such case however, it is a server issue, not client-side issue. It certainly is a risk, but not follow RFCs, is already bad as it is.


----------------------------------------
Feature #2567: Net::HTTP does not handle encoding correctly
https://bugs.ruby-lang.org/issues/2567#change-57357

* Author: Ryan Sims
* Status: Assigned
* Priority: Normal
* Assignee: Yui NARUSE
----------------------------------------
=begin
 A string returned by an HTTP get does not have its encoding set appropriately with the charset field, nor does the content_type report the charset. Example code demonstrating incorrect behavior is below.
 
 #!/usr/bin/ruby -w
 # encoding: UTF-8
 
 require 'net/http'
 
 uri = URI.parse('http://www.hearya.com/feed/')
 result = Net::HTTP.start(uri.host, uri.port) {|http|
     http.get(uri.request_uri)
 }
 
 p result['content-type']     # "text/xml; charset=UTF-8" <- correct
 p result.content_type        # "text/xml" <- incorrect; truncates the charset field
 puts result.body.encoding    # ASCII-8BIT <- incorrect encoding, should be UTF-8
=end




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>