Issue #2567 has been updated by Yui NARUSE.


Ricardo Amorim wrote:
> Yui NARUSE wrote:
> > Is such a string always ISO-8859-1 other than non US/West Europe?
> 
> Yes, ISO-8859-1 always fits. I'm mainly accessing Brazilian servers so that explains.

As I understand, Brazilian uses Portuguese and it is in ISO-8859-1.


Anyway, I found a description about deciding encoding on http-bis.
http://tools.ietf.org/html/draft-ietf-httpbis-p3-payload-17#section-4.2

   In practice, resource owners do not always properly configure their
   origin server to provide the correct Content-Type for a given
   representation, with the result that some clients will examine a
   response body's content and override the specified type.  Clients
   that do so risk drawing incorrect conclusions, which might expose
   additional security risks (e.g., "privilege escalation").
   Furthermore, it is impossible to determine the sender's intent by
   examining the data format: many data formats match multiple media
   types that differ only in processing semantics.  Implementers are
   encouraged to provide a means of disabling such "content sniffing"
   when it is used.

So to discourage developers' net/http should set an encoding until it is practical.
----------------------------------------
Feature #2567: Net::HTTP does not handle encoding correctly
http://redmine.ruby-lang.org/issues/2567

Author: Ryan Sims
Status: Assigned
Priority: Low
Assignee: Yui NARUSE
Category: lib
Target version: 2.0.0
ruby -v: ruby 1.9.1p376 (2009-12-07 revision 26041) [i686-linux]


=begin
 A string returned by an HTTP get does not have its encoding set appropriately with the charset field, nor does the content_type report the charset. Example code demonstrating incorrect behavior is below.
 
 #!/usr/bin/ruby -w
 # encoding: UTF-8
 
 require 'net/http'
 
 uri = URI.parse('http://www.hearya.com/feed/')
 result = Net::HTTP.start(uri.host, uri.port) {|http|
     http.get(uri.request_uri)
 }
 
 p result['content-type']     # "text/xml; charset=UTF-8" <- correct
 p result.content_type        # "text/xml" <- incorrect; truncates the charset field
 puts result.body.encoding    # ASCII-8BIT <- incorrect encoding, should be UTF-8
=end



-- 
http://redmine.ruby-lang.org