Issue #12852 has been updated by Olivier Lacan.


Matthew Kerwin wrote:
> The rails snippet you linked is part of a HTML form. A web browser displaying and submitting that form would interpret the `✓` entity as U+2713 CHECK MARK, yes, but it would percent-encode it as `%E2%9C%93` before using it in a HTTP request, because HTTP uses URIs, not IRIs. (The browser may present it as a single Unicode character in the awesomebar/omnibar/address bar, but that's a UI presentation element and not a true and accurate display of the URI.)

It's common for OAuth authentication flows to store a destination URI to return to when the handshake process is completed. This URI can be stored without first being processed by a web sever that will encode it in the way Rails does for submitted forms since it's not meant to be processed  that is until it comes back to the origin server.

I opened this due to an issue I encountered in an OAuth provider handshake procedure. You could argue that I should be expected to `URI.encode` any URI set as a destination query parameter to prevent this issue from occurring, surely.

Do you not agree that URI.parse should accept unicode entities in URIs? It wasn't clear from your response. 

I'm not aware of any IRI-compatible API in MRI that could allow me to directly parse URIs containing non-ASCII characters with Ruby, whether they match the strict definition of a URI or not. 

----------------------------------------
Bug #12852: URI.parse can't handle non-ascii URIs
https://bugs.ruby-lang.org/issues/12852#change-60940

* Author: Olivier Lacan
* Status: Open
* Priority: Normal
* Assignee: akira yamada
* ruby -v: 
* Backport: 2.1: UNKNOWN, 2.2: UNKNOWN, 2.3: UNKNOWN
----------------------------------------
Given a return URL path like: `/search?utf8=\u{2713}&q=foo`, `URI.parse` raises the following exception: 

```ruby
URI.parse "/search?utf8=\u{2713}&q=foo"
URI::InvalidURIError: URI must be ascii only "/search?utf8=\u{2713}&q=foo"
```

This `\u{2713}` character is commonly used by web frameworks like Rails to enforce UTF-8 in forms: https://github.com/rails/rails/blob/92703a9ea5d8b96f30e0b706b801c9185ef14f0e/actionview/lib/action_view/helpers/form_tag_helper.rb#L823-L830

```ruby
"\u{2713}"
=> ""
```

Is it unreasonable to expect non-ascii portion of URIs to be handled by URI.parse? The way to circumvent this issue is to call URI.encode on the URI string prior to passing it to URI.parse:

```ruby
URI.parse URI.encode("/search?utf8=\u{2713}&q=foo")
=> #<URI::Generic /search?utf8=%E2%9C%93&q=foo>
```

By comparison, a library like Addressable parses this URI without issue.

```
require "addressable/uri"
=> #<Addressable::URI:0x3feffa84158c URI:/search?utf8=&q=foo>
```

This is how Addressable implements parsing:
https://github.com/sporkmonger/addressable/blob/a15b7045a09911bcc47b106200554809c879a5f6/lib/addressable/uri.rb#L75-L145

PS: Tried under MRI 2.3.1 and 2.4.0-preview1



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>