Issue #14418 has been updated by jakub.wozny (Jakub Wony).


Ok, Blow is the regexp that I tested. I used utf-8 encodnings at the begining:

~~~ ruby
"fuball "*20 =~ /([\S\s]{1000})/i
~~~

Some measurements:

~~~ ruby
 (0..20).each { |n| puts Benchmark.measure { "fuball "*n =~ /^([\S\s]{1000})/i } }
  0.000000   0.000000   0.000000 (  0.000481)
  0.000000   0.000000   0.000000 (  0.000079)
  0.000000   0.000000   0.000000 (  0.000246)
  0.000000   0.000000   0.000000 (  0.000751)
  0.010000   0.000000   0.010000 (  0.002447)
  0.000000   0.000000   0.000000 (  0.006554)
  0.010000   0.000000   0.010000 (  0.007416)
  0.020000   0.000000   0.020000 (  0.022623)
  0.070000   0.000000   0.070000 (  0.066888)
  0.200000   0.000000   0.200000 (  0.196393)
  0.590000   0.000000   0.590000 (  0.591980)
  1.770000   0.000000   1.770000 (  1.772828)
  5.290000   0.010000   5.300000 (  5.292948)
 15.860000   0.000000  15.860000 ( 15.868370)
~~~


I would expect that this code should work as fast as version without ````/i```` flag.

~~~ ruby
"fuball "*20 =~ /([\S\s]{1000})/

(0..20).each { |n| puts Benchmark.measure { "fuball "*n =~ /^([\S\s]{1000})/ } }
  0.000000   0.000000   0.000000 (  0.000036)
  0.000000   0.000000   0.000000 (  0.000009)
  0.000000   0.000000   0.000000 (  0.000011)
  0.000000   0.000000   0.000000 (  0.000016)
  0.000000   0.000000   0.000000 (  0.000018)
  0.000000   0.000000   0.000000 (  0.000029)
  0.000000   0.000000   0.000000 (  0.000020)
  0.000000   0.000000   0.000000 (  0.000021)
  0.000000   0.000000   0.000000 (  0.000023)
  0.000000   0.000000   0.000000 (  0.000024)
  0.000000   0.000000   0.000000 (  0.000016)
  0.000000   0.000000   0.000000 (  0.000027)
  0.000000   0.000000   0.000000 (  0.000022)
  0.000000   0.000000   0.000000 (  0.000023)
  0.000000   0.000000   0.000000 (  0.000024)
  0.000000   0.000000   0.000000 (  0.000023)
  0.000000   0.000000   0.000000 (  0.000024)
  0.000000   0.000000   0.000000 (  0.000026)
  0.000000   0.000000   0.000000 (  0.000025)
  0.000000   0.000000   0.000000 (  0.000026)
  0.000000   0.000000   0.000000 (  0.000053)
~~~

Another test cases:

~~~ ruby
Benchmark.measure { " "*20 =~ /^([\S\s]{20})/i } # 0.000000   0.000000   0.000000 (  0.000431)
Benchmark.measure { " "*20 =~ /^([\S\s]{30})/i } # 0.000000   0.000000   0.000000 (  0.000427)
Benchmark.measure { " "*20 =~ /^([\S\s]{40})/i } # 0.000000   0.000000   0.000000 (  0.000430)
Benchmark.measure { " "*20 =~ /^([\S\s]{50})/i } # too long to wait

#without /i flag:
Benchmark.measure { " "*20 =~ /^([\S\s]{50})/ } #0.000000   0.000000   0.000000 (  0.000043)
~~~

I tested in other encodings:

~~~ ruby
Benchmark.measure{("fuball ".encode("ISO-8859-1"))*20 =~ /([\S\s]{1000})/i}.to_s # => "  3.450000   0.000000   3.450000 (  3.452036)\n"
~~~

In case of other encoding, removing **/i** also speeds up:

~~~ ruby
Benchmark.measure{("fuball ".encode("ISO-8859-1"))*20 =~ /([\S\s]{1000})/}.to_s #=> "  0.010000   0.000000   0.010000 (  0.000514)\n"
~~~

> Reason I ask mostly is because I assume you output german text and
the german umlauts are one huge reason for me to prefer ISO encoding
(due to it being simpler for me to handle with it in a project, as
opposed to Unicode variants).

I have multilingual app so I need to stay in unicode.




----------------------------------------
Bug #14418: ruby 2.5 slow regexp execution
https://bugs.ruby-lang.org/issues/14418#change-69983

* Author: jakub.wozny (Jakub Wony)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 2.5
* Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN
----------------------------------------
I have simple regexp that performing very slow.
~~~ ruby
"fuball "*20 =~ /^([\S\s]{1000})/i
~~~

It works fast if I remove ```\i``` flag. I figured out that is also depends on string length or on quantifier value (in this case it is ```{1000}```).
When you remove `````` form the string it also works fast.

I tested on 2.3.1, 2.4.3 and 2.5.0.

I'm not sure it is a bug or it just works that way.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request / ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>