At 23:53 09/03/18, Andreas Grau wrote:
>Bug #1301: Poor RegExp Matching Performance
>http://redmine.ruby-lang.org/issues/show/1301
>
>Author: Andreas Grau
>Status: Open, Priority: Normal
>Category: core
>ruby -v: 1.9.0 (2008-06-20 revision 17482) [i486-linux]
>
>I noticed a very poor performance on matching regular expressions.
>
>Running following code using ruby 1.9.0 on a Core2Duo (2x2Ghz) takes more 
>than 20s to complete.
>
>regexp = /[b]+.+.+.+.+.+.+.+.+[a]/

Eero gave a much better way to write this regexp.
But why is it so bad, in particular for the string
below? The string has 37 'b's, and you are asking
it to try all different ways it can be split into
nine non-empty substrings (the first of which has
to consist of 'b's only), followed by an [a]. It's
surprising that this only takes 23 seconds! 

Theory about regular expressions (formal language theory)
says there shouldn't be any difference, but Ruby regular
expressions (same for Perl, Python, and so on) are not
really regular expressions in the formal sense anymore,
because they allow capturing and many more neat practical
things. So the implementation uses backtracking, and
your regexp above creates a lot of backtracking.
For details on how to write good regular expressions,
please see Jeffrey Friedl's book
(e.g. at
http://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124/)

BTW, the [] in the above expression are not necessary,
it would usually be written

regexp = /b+.+.+.+.+.+.+.+.+a/

Regards,    Martin.

>str="bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
>regexp.match(str)
>
>$ time ruby1.9  test.rb 
>
>real    0m23.029s
>user    0m22.421s
>sys     0m0.012s
>
>
>----------------------------------------
>http://redmine.ruby-lang.org


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst / it.aoyama.ac.jp