Issue #2567 has been updated by jonathan rochkind.


how else is a developer/client going to figure out the encoding EXCEPT from the HTTP server response content-type?  You pretty much need to set the encoding in ruby 1.9 for the response to be useable.  If the Content-Type isn't trustworthy enough for Net::HTTP... what else is available for the developer?  In the vast majority of cases, the developer will have to map from content-type.  

If Net::HTTP doesn't do it, then every developer has to figure out how to do it on their own and write extra code to do it. Setting the encoding from the HTTP response is going to be by far the most common use pattern. I think Net::HTTP should do it.   

If there are certain edge cases where a knowledgeable developer knows s/he wants something different, then perhaps a "set_encoding?" attribute can be provided set to true by default, but settable to false. Alternately, the developer can always call force_encoding themselves as well. 

The knowledgeable developer perhaps needs ways to disable this behavior, but the beginning developer needs reasonable behavior as a default, and right now they don't get it -- character encoding is one of the most confusing issues in ruby 1.9 (esp to debug), and the more standard libraries set it appropriately when appropriate information is available, the less of a barrier this is to getting started with ruby 1.9.  When you're reading from an arbitrary data stream without encoding metadata (like the file system), it's one thing, the standard libraries can't possibly know what to do.  But when you're reading from a stream with encoding metadata (HTTP headers), using a library specifically designed for that kind of stream (stdlib Net::HTTP), it's unforgiveable that ruby makes each (newbie!) developer reinvent the wheel again and again. 
----------------------------------------
Feature #2567: Net::HTTP does not handle encoding correctly
http://redmine.ruby-lang.org/issues/2567

Author: Ryan Sims
Status: Assigned
Priority: Low
Assignee: Yui NARUSE
Category: lib
Target version: 1.9.x
ruby -v: ruby 1.9.1p376 (2009-12-07 revision 26041) [i686-linux]


=begin
 A string returned by an HTTP get does not have its encoding set appropriately with the charset field, nor does the content_type report the charset. Example code demonstrating incorrect behavior is below.
 
 #!/usr/bin/ruby -w
 # encoding: UTF-8
 
 require 'net/http'
 
 uri = URI.parse('http://www.hearya.com/feed/')
 result = Net::HTTP.start(uri.host, uri.port) {|http|
     http.get(uri.request_uri)
 }
 
 p result['content-type']     # "text/xml; charset=UTF-8" <- correct
 p result.content_type        # "text/xml" <- incorrect; truncates the charset field
 puts result.body.encoding    # ASCII-8BIT <- incorrect encoding, should be UTF-8
=end



-- 
http://redmine.ruby-lang.org