Issue #2567 has been updated by Marcel Cary.


I'm also encountering this issue after upgrading from 1.8.7 to 2.0.0.  The issue was difficult to troubleshoot because it didn't manifest until after Net::HTTP was done and gone from the stack, when trying to concatenate the response body with a UTF-8 string.  It was also intermittent in that many responses did not trigger the error.

Although I confess that I've never had to deal with a server that misidentifies the charset of the response, I lean toward Hugo Corbucci's assessment: Net::HTTP should use the content-type header and possibly the body.  But if that's too risky, how about just adding a hook, something like Net::HTTP.charset_guesser, which defaults to setting ASCII-8BIT, but can easily be set to use the http header or scan the body instead?  That way the default behavior doesn't change, but we get an easy mechanism for setting Corbucci's #2 or #3 application wide on an opt-in basis, for example via loading a gem.

If a patch is what's stopping this feature from being implemented, I'm happy to provide one.

----------------------------------------
Feature #2567: Net::HTTP does not handle encoding correctly
https://bugs.ruby-lang.org/issues/2567#change-49051

* Author: Ryan Sims
* Status: Assigned
* Priority: Low
* Assignee: Yui NARUSE
* Category: lib
* Target version: next minor
----------------------------------------
=begin
 A string returned by an HTTP get does not have its encoding set appropriately with the charset field, nor does the content_type report the charset. Example code demonstrating incorrect behavior is below.
 
 #!/usr/bin/ruby -w
 # encoding: UTF-8
 
 require 'net/http'
 
 uri = URI.parse('http://www.hearya.com/feed/')
 result = Net::HTTP.start(uri.host, uri.port) {|http|
     http.get(uri.request_uri)
 }
 
 p result['content-type']     # "text/xml; charset=UTF-8" <- correct
 p result.content_type        # "text/xml" <- incorrect; truncates the charset field
 puts result.body.encoding    # ASCII-8BIT <- incorrect encoding, should be UTF-8
=end




-- 
https://bugs.ruby-lang.org/