Issue #17837 has been updated by duerst (Martin D=FCrst).


Eregon (Benoit Daloze) wrote in #note-10:

> I think fixing Timeout.timeout might be possible.
> The main/major issue is it can trigger within `ensure`, right? Is there a=
nything else?
> We could automatically mask `Thread#raise` within `ensure` so it only hap=
pens after the `ensure` body completes.
> And we could still have a larger "hard timeout" if an `ensure` takes way =
too long (shouldn't happen, but one cannot be sure).
> I recall discussing this with @schneems some time ago on Twitter.

I created a separate issue for the improvement of Timeout.timeout: #17849. =
Please feel free to discuss there. My guess is that there are all kinds of =
other issues that can happen in a Web application, so it would be better to=
 solve this for the general case.

Dan0042 (Daniel DeLorme) wrote in #note-11:
> duerst (Martin D=FCrst) wrote in #note-9:
> > I very strongly suggest that this feature be voluntary, e.g. as an addi=
tional flag on the regular expression.
> =

> If you have to turn it on for each regexp, that would make the feature ki=
nda useless. I agree with the OP that this decision is at the application l=
evel.

I have no problems with making it possible to switch this on at the applica=
tion level.

> You want it either on or off for all/most regexps. Although it would make=
 sense to be able to override the default timeout for a few specific regexp=
s that are known to be time-consuming or performance-critical.

Yes. My assumption is that when writing a regular expression, the writer sh=
ould make sure it's well behaved. So in general, timeouts would only be nee=
ded for regular expressions that come from the outside.

> Rather than `CHECK_INTERRUPT_IN_MATCH_AT` would it be feasible to check f=
or timeouts only when backtracking occurs?

In a backtracking regular expression engine, backtracking occurs very often=
. There are many cases of backtracking that are still totally harmless.

Ideally, a regular expression engine would deal with most regular expressio=
ns in a way similar to what RE2 (or any DFA-based implementation) does, and=
 only use a timeout for those that a DFA-based strategy cannot handle (back=
references,...). But that would require quite a bit of implementation work.

(Of course all the above discussion is predicated on the assumption that ti=
meouts cannot be added to regular expressions with negligible speed loss.) =


----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91820

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few thr=
ough the years. https://owasp.org/www-community/attacks/Regular_expression_=
Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, th=
e key is for an attacker (or possibly innocent person) to supply either a p=
roblematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =3D~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, i=
t will happen as long as you are using Regexps. =



Currently the only feasible way of supplying a consistent safeguard is by u=
sing `Thread.raise` and managing all execution. This kind of pattern requir=
es usage of a third party implementation. There are possibly issues with jR=
uby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-=
us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=3Dnet-=
5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/q=
uestions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/q=
uestions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting feature=
s (available in Ruby RE2 gem)

```
irb(main):003:0> r =3D RE2::Regexp.new('A(B|C+)+D')
=3D> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=3D> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for a=
ll Regexp operations in Ruby. =


Per Regexp would require massive application changes, almost all web apps w=
ould do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when s=
et to second a "monitor" thread would track running regexps and time them o=
ut according to the global value.

### Alternatives =


I recommend against a "per Regexp" API as this decision is at the applicati=
on level. You want to apply it to all regular expressions in all the gems y=
ou are consuming.

I recommend against a move to RE2 at the moment as way too much would break =



### See also: =


https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation=
-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-r=
edos-cheat-sheet-a78d0ed7d865





-- =

https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=3Dunsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>