Issue #13926 has been updated by usa (Usaku NAKAMURA).

Backport changed from 2.2: DONTNEED, 2.3: REQUIRED, 2.4: DONE to 2.2: DONTNEED, 2.3: DONE, 2.4: DONE

ruby_2_3 r62133 merged revision(s) 60021.

----------------------------------------
Bug #13926: Non UTF response headers raise an Argument error since 2.4.2p198
https://bugs.ruby-lang.org/issues/13926#change-70076

* Author: petehamilton (Pete Hamilton)
* Status: Closed
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.4.2p198 (2017-09-14 revision 59899) [x86_64-darwin16]
* Backport: 2.2: DONTNEED, 2.3: DONE, 2.4: DONE
----------------------------------------
When setting headers using `Net::HTTPHeader#add_field` or `Net::HTTPHeader#[]=`  in v2.4.2, an `ArgumentError (invalid byte sequence in UTF-8)` is raised.

In 2.4.1, this behaviour didn't exist and it looks like it was introduced in one of the revisions associated with https://bugs.ruby-lang.org/issues/13852, where the header value is matched against a regular expression to prevent newlines.

Previously, `Net::HTTP` would accept non-UTF8 header values and just return them as invalid UTF8 strings. It was then on the user of `Net::HTTP` to handle this. With this change, there's now no way for the user to handle the case where they receive non-UTF8 header values as `Net::HTTP` raises an error.

[RFC2616](https://tools.ietf.org/html/rfc2616#section-4.2) allowed an HTTP header field content to be made up of any non-whitespace octets. Because of this [RFC7230](https://tools.ietf.org/html/rfc7230#section-3.2.4) makes an allowance for all characters in the ISO-8859-1 charset (both lower and extended ASCII characters).

Specifically, this section of RFC7230 suggests that although ideally response header values would be compatible with UTF-8, we can't assume this to be the case.

>   Historically, HTTP has allowed field content with text in the
>   ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
>   through use of [RFC2047] encoding.  In practice, most HTTP header
>   field values use only a subset of the US-ASCII charset [USASCII].
>   Newly defined header fields SHOULD limit their field values to
>   US-ASCII octets.  A recipient SHOULD treat other octets in field
>   content (obs-text) as opaque data.

Not entirely sure where to go from here or what the fix is but given this is a behaviour change, it'd be great to hear your thoughts.

---Files--------------------------------
net_http_utf8_tests.patch (1.14 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>