Issue #15979 has been updated by jeremyevans0 (Jeremy Evans).

Status changed from Open to Rejected
File uri-parse-validate-15979.patch added

This is not a bug, and not related to validation.  The reason for the behavior is that `URI.parse` uses an RFC 3986 parser, while `URI::HTTPS.build` uses an RFC 2396 parser.  If you use `URI::HTTPS.new` with an RFC 3986 parser and specify to validate the components, you get a valid URI:

```ruby
URI::HTTPS.new(
  *URI::RFC3986_PARSER.split(
    "https://-._~%2C!$&'()*+,;=:@-._~%2C!$&'()*+,;=:/foo?/-._~%2C!$&'()*+,;=:@/?"),
  URI::RFC3986_PARSER, true)
```

The issue here is that the hostname you provide in the URI is invalid in RFC 2396 but valid in RFC 3986.

RFC 2396 ABNF:

```
host          = hostname | IPv4address
hostname      = *( domainlabel "." ) toplabel [ "." ]
domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
toplabel      = alpha | alpha *( alphanum | "-" ) alphanum
```

RFC 3986 ABNF:

```
host          = IP-literal / IPv4address / reg-name
reg-name      = *( unreserved / pct-encoded / sub-delims )
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                    / "*" / "+" / "," / ";" / "="
```

With the URI provided, the host is `-._~%2C!$&'()*+,;=`, which is valid according to the RFC 3986 ABNF:

```
- : unreserved
. : unreserved
_ : unreserved
~ : unreserved
%2C : pct-encoded
! : sub-delims
$ : sub-delims
& : sub-delims
' : sub-delims
( : sub-delims
) : sub-delims
* : sub-delims
+ : sub-delims
, : sub-delims
; : sub-delims
= : sub-delims
```

As to why RFC 3986 is used in some places (parse/join/split) and RFC 2396 (all other places) is used in others, I believe it is related to backwards compatibility.  Previously, There were some issues with `[` and `]` not being allowed in query parts in RFC 3986 (#10402), but those are now worked around.  However, `URI::RFC2396_Parser` and `URI::RFC3986_Parser` are not API compatible, so you cannot simply swap one for the other without breaking things.

In case you or someone else is interested in changing the default parser, attached is a minimal patch to make the RFC 3986 parser the default.  It passes the URI tests, but I haven't done any testing beyond that.  Hopefully it provides a decent starting point.

----------------------------------------
Bug #15979: URI.parse does not validate components
https://bugs.ruby-lang.org/issues/15979#change-81983

* Author: singpolyma (Stephen Paul Weber)
* Status: Rejected
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 
* Backport: 2.4: UNKNOWN, 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
``` ruby
URI.parse("https://-._~%2C!$&'()*+,;=:@-._~%2C!$&'()*+,;=:/foo?/-._~%2C!$&'()*+,;=:@/?")
```

happily return a `URI::HTTPS` object, even though it has an invalid component and cannot be constructed using `URI::HTTPS.build`

This is because the parser uses the undocumented initializer which defaults to not validating the components.  I would suggest to send that initializer the flag to allow validation or to use the build method instead from the parser.


---Files--------------------------------
uri-parse-validate-15979.patch (3.42 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>