Issue #15933 has been updated by gareth (Gareth Adams).


phluid61 (Matthew Kerwin) wrote:
> 
> The entire registry is available as [XML](https://www.iana.org/assignments/media-types/media-types.xml) and each individual registry is available as (ironically) text/csv; e.g. https://www.iana.org/assignments/media-types/text.csv

Sorry, I wasn't clear. Yes the registry itself is in XML/CSV  that's what the `mime_types_data` gem uses to build its dataset but it doesn't include "default charset parameter" or even details of which parameters are required, so that has to be manually parsed out of the RFC/MIME registration. That's exactly what I did to get that shortlist of "conflicts" above.

I assumed your suggestion to "ignore the protocol; and consult the IANA registry" meant "at runtime" and I was clarifying that there was no way for that to be possible. If you didn't mean that (which was probably the more obvious way to look at it) then there's no problem at all.

> I would suggest ignoring the scheme altogether

I'm happy to do this, the only other case that's handled in this file is an `ftp://` URI and the FTP parsing here doesn't have any way to extract metadata from the transferred file. Specifically it doesn't perform any file-extention-to-mime-type mapping, doesn't parse a content type, and so FTP URLs can't hit this branch of the code.

Thanks

----------------------------------------
Bug #15933: OpenURI: Assign default charset for HTTPS as well as HTTP
https://bugs.ruby-lang.org/issues/15933#change-78673

* Author: gareth (Gareth Adams)
* Status: Assigned
* Priority: Normal
* Assignee: akr (Akira Tanaka)
* Target version: 
* ruby -v: 
* Backport: 2.4: UNKNOWN, 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Using `open-uri` to load a document in the following circumstances:

* The `Content-Type` header is `text/*` and *doesn't* specify a charset, e.g. `Content-Type: text/csv`
* The document is loaded from an `https://` URL

í─will cause the resulting string to have `ASCII-8BIT` encoding.

As the [documentation for OpenURI#charset](https://github.com/ruby/ruby/blob/trunk/lib/open-uri.rb#L538-L560) mentions, [RFC2616/3.7.1](https://tools.ietf.org/html/rfc2616#section-3.7.1) says:

> When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP.

OpenURI takes this literally - only assigning ISO-8859-1 if `@base_uri.scheme` is *exactly* "http". This check was written [17 years ago](https://github.com/ruby/ruby/commit/3a20ed532b57da1e58287a5c53abe14400a085f4#diff-0f19cb99597e5fb90bfb937b22143b51R264) in 2002 even before TLS 1.1 was defined, and well before HTTPS was common.

I believe this check should now also match the scheme "https". As [RFC2818/2](https://tools.ietf.org/html/rfc2818#section-2) says:

> Conceptually, HTTP/TLS is very simple. Simply use HTTP over TLS precisely as you would use HTTP over TCP

1. Is this a suitable change to make?

2. I have a patch to fix the functionality (attached). What else do I need to specify in terms of documentation/tests? I'm happy to put more work into this, but it's my first contribution to Ruby core and I'd like some pointers. I've read through https://bugs.ruby-lang.org/projects/ruby/wiki/HowToReport

---Files--------------------------------
ruby-changes.patch (1.21 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>