Issue #2567 has been updated by naruse (Yui NARUSE).


chucke (Tiago Cardoso) wrote:
> Bitten by this as well. I'd go the route proposed earlier:
> 
> 1. By default, encode the body using the charset set in content-type header.

HTML's encoding is definition is bit different from usual encoding converters as described at WHATWG Encoding Standard.
https://encoding.spec.whatwg.org/

And charset parameter has many aliases which sometimes different from normal encoding aliases.
https://encoding.spec.whatwg.org/#names-and-labels

> 2. Provide an option to disable this, to keep old behaviour.

How the option is specified is problem.
The encoding may differ per content (URL / path).
Then it should be specified with get/post methods.
But there's already header and data hash arguments...

----------------------------------------
Feature #2567: Net::HTTP does not handle encoding correctly
https://bugs.ruby-lang.org/issues/2567#change-68437

* Author: slide_rule (Ryan Sims)
* Status: Assigned
* Priority: Normal
* Assignee: naruse (Yui NARUSE)
* Target version: 
----------------------------------------
=begin
 A string returned by an HTTP get does not have its encoding set appropriately with the charset field, nor does the content_type report the charset. Example code demonstrating incorrect behavior is below.
 
 #!/usr/bin/ruby -w
 # encoding: UTF-8
 
 require 'net/http'
 
 uri = URI.parse('http://www.hearya.com/feed/')
 result = Net::HTTP.start(uri.host, uri.port) {|http|
     http.get(uri.request_uri)
 }
 
 p result['content-type']     # "text/xml; charset=UTF-8" <- correct
 p result.content_type        # "text/xml" <- incorrect; truncates the charset field
 puts result.body.encoding    # ASCII-8BIT <- incorrect encoding, should be UTF-8
=end




-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>