Issue #17992 has been updated by k0kubun (Takashi Kokubun).

Status changed from Open to Feedback

Could you clarify a bit more context about why you'd like to escape these c=
haracters not supported in `CGI.escapeHTML`? =


I believe `CGI.escapeHTML` has been primarily used to avoid breaking the DO=
M structure by the escaped content with optimal performance. It's a very un=
derstandable behavior to me, and I would prefer rather not escaping any oth=
er character for the best performance as long as it's not considered as a s=
ecurity vulnerability.


```rb
require 'benchmark/ips'
require 'htmlentities'
require 'cgi/escape'

str =3D <<~HTML
  <body>
  <div>
      <h1>Example Domain</h1>
      <p>This domain is established to be used for illustrative examples in=
 documents. You may use this
      domain in examples without prior coordination or asking for permissio=
n.</p>
      <p><a href=3D"http://www.iana.org/domains/example">More information..=
.</a></p>
  </div>
  </body>
HTML
coder =3D HTMLEntities.new

Benchmark.ips do |x|
  x.report("CGI.escapeHTML") { CGI.escapeHTML(str) }
  x.report("HTMLEntities #{HTMLEntities::VERSION::STRING}") { coder.encode(=
str) }
  x.compare!
end
```

```
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin19]
Warming up --------------------------------------
      CGI.escapeHTML   112.937k i/100ms
  HTMLEntities 4.3.4     1.029k i/100ms
Calculating -------------------------------------
      CGI.escapeHTML      1.131M (=B1 2.3%) i/s -      5.760M in   5.095252s
  HTMLEntities 4.3.4     10.281k (=B1 2.1%) i/s -     51.450k in   5.006333s

Comparison:
      CGI.escapeHTML:  1131036.5 i/s
  HTMLEntities 4.3.4:    10281.4 i/s - 110.01x  (=B1 0.00) slower
```

Note that `CGI.escapeHTML` is the default HTML escape method. You'll make e=
very embedded Ruby expression 110x slower if you suddenly replace `CGP.esca=
peHTML` with that gem.

We may want to support escaping some other characters for some other usages=
, but for backward compatibility and the performance in existing places, th=
e feature must be enabled by a new option or another method.

----------------------------------------
Bug #17992: Upstreaming the htmlentities gem into CGI#.(un)escape_html
https://bugs.ruby-lang.org/issues/17992#change-92506

* Author: AMomchilov (Alexander Momchilov)
* Status: Feedback
* Priority: Normal
* Backport: 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
Hi there,

I was looking to unescape some HTML entities in a String, and I discovered =
that `CGI#.(un)escape_html` is **really** limited. Many StackOverflow quest=
ions share a similar disappointment, and point users to using the [htmlenti=
ties gem](https://github.com/threedaymonk/htmlentities):

1. https://stackoverflow.com/a/383561/3141234
2. https://stackoverflow.com/a/22926384/3141234

This solved my problem, but I feel like something this standard/universal s=
hould be built-in. To that end, I'm interested in working on merging the ht=
mlentities gem into CGI's repo. Would this be a welcome change?

* I've e-mailed the author (Paul Battley) privately, and got his blessing t=
o do so.
* It's MIT licensed, so that should be OK.




-- =

https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=3Dunsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>