Issue #2567 has been updated by Carsten Bormann.


The previous comments appear to be confused.  If the web server indicates a charset in an HTTP Content-Type header, this takes precedence over everything that may be in the body, so it is always correct to set the Ruby "encoding" from that.  There is no "guessing" of the charset in that case.

Only if no charset is given in HTTP, would a browser resort to looking into the content for clues (META/HTTP-EQUIV in HTML4, META/CHARSET in HTML5).
If there are none, then a locale-specific default is likely to apply (contrary to the obsolete specification in RFC 2616, which mandates ISO-8859-1 as the standard default for this case; this is being removed in httpbis).

I'm neutral on whether Net::HTTP should attempt to deliver bodies in the correct Ruby "encoding" or always say binary = "ASCII-8BIT".  The user of the body may want to apply content-type sniffing regardless of what the body is declared to be (e.g., to detect videos, zip/rar files, etc.).  I just want to point out that the above comments give an incorrect reason not to operate based on a charset attribute of a Content-Type HTTP header.

A good, stable, fully vetted tutorial on character encoding on the Web is in

http://www.w3.org/International/tutorials/tutorial-char-enc/

The sniffing issue is the subject of an ongoing IETF internet-draft:

http://tools.ietf.org/html/draft-abarth-mime-sniff
----------------------------------------
http://redmine.ruby-lang.org/issues/show/2567

----------------------------------------
http://redmine.ruby-lang.org