Issue #2567 has been updated by Eric Hodel.


=begin
What should the user expect when the response headers are wrong?  For example, the response Content-Type claims ISO-8859-1 but the content was UTF-8? (Yes, this really happens)

If Net::HTTP forces the encoding to ISO-8859-1 you will have undetectably garbled text:

  $ ruby -e 's = "дл"; s.force_encoding Encoding::ISO_8859_1; puts s.valid_encoding?, s.encode(Encoding::UTF_8)'
  true
  

I think leaving the response as binary encoding and allowing the user to apply the proper heuristics to determine the encoding for their is the best way.

If you wish to read HTML documents, perhaps mechanize is a better choice as it implements a heuristic similar to the one in HTML5 to find the encoding of the document despite potential lies from the server or document header.

=end

----------------------------------------
Feature #2567: Net::HTTP does not handle encoding correctly
http://redmine.ruby-lang.org/issues/2567

Author: Ryan Sims
Status: Assigned
Priority: Low
Assignee: Yui NARUSE
Category: lib
Target version: 2.0.0
ruby -v: ruby 1.9.1p376 (2009-12-07 revision 26041) [i686-linux]


=begin
 A string returned by an HTTP get does not have its encoding set appropriately with the charset field, nor does the content_type report the charset. Example code demonstrating incorrect behavior is below.
 
 #!/usr/bin/ruby -w
 # encoding: UTF-8
 
 require 'net/http'
 
 uri = URI.parse('http://www.hearya.com/feed/')
 result = Net::HTTP.start(uri.host, uri.port) {|http|
     http.get(uri.request_uri)
 }
 
 p result['content-type']     # "text/xml; charset=UTF-8" <- correct
 p result.content_type        # "text/xml" <- incorrect; truncates the charset field
 puts result.body.encoding    # ASCII-8BIT <- incorrect encoding, should be UTF-8
=end



-- 
http://redmine.ruby-lang.org