Issue #2567 has been updated by jrochkind (jonathan rochkind).


It seems like encoding on _headers_ is a different question than encoding on bodies. 

Perhaps encoding on _headers_ should be left ascii-8bit -- I don't understand if the spec even says the charset in the header is supposed to apply to other headers. 

But it is clear to me that encoding on body should be set per headers, when possible.  

Most langauges are not so explicit about encoding as ruby 1.9. In most languages you can get away with ignoring encoding, and at worst get garbled text -- in ruby 1.9 you'll get exceptions raised. 

'Implementers are encouraged to provide a means of disabling such "content sniffing"   when it is used.'

Fortunately, there is a clear way to do that -- we're not talking about net::http doing any transcoding, only about it setting the encoding value. You want to 'disable' that? Just

    response.body.force_encoding("ASCII-8BIT") 

to throw out whatever encoding it determined from the headers. 

Whatever problems would be caused by http servers sending bad content-type header and net::http believing it -- wouldn't those same problems also be caused by leaving the encoding ASCII-8BIT?  If an individual client wants to use heuristics to guess encoding, there's nothing stopping them -- just force_encoding("ASCII-8BIT") and then use whatever heuristics you like and force_encoding as a result at the end. 

But by default, net::http should assume that the spec is being followed and the content-type header is correct. 
----------------------------------------
Feature #2567: Net::HTTP does not handle encoding correctly
https://bugs.ruby-lang.org/issues/2567#change-26671

Author: slide_rule (Ryan Sims)
Status: Assigned
Priority: Low
Assignee: naruse (Yui NARUSE)
Category: lib
Target version: 2.0.0


=begin
 A string returned by an HTTP get does not have its encoding set appropriately with the charset field, nor does the content_type report the charset. Example code demonstrating incorrect behavior is below.
 
 #!/usr/bin/ruby -w
 # encoding: UTF-8
 
 require 'net/http'
 
 uri = URI.parse('http://www.hearya.com/feed/')
 result = Net::HTTP.start(uri.host, uri.port) {|http|
     http.get(uri.request_uri)
 }
 
 p result['content-type']     # "text/xml; charset=UTF-8" <- correct
 p result.content_type        # "text/xml" <- incorrect; truncates the charset field
 puts result.body.encoding    # ASCII-8BIT <- incorrect encoding, should be UTF-8
=end



-- 
http://bugs.ruby-lang.org/