This is a multi-part message in MIME format.
--------------040202000801080102020800
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 7bit
The new Regexp engine still has an odd behavior which I think is a bug.
The following code tests four Regexps against the same long String:
require "benchmark"
require "strscan"
text test " * 100_000
regexps
/\w+\s/,
/\w+\s?/,
/\w+\s*/,
/\w+\s+/,
]
Benchmark.bm(20) do |bm|
text tringScanner.new text
for r in regexps
bm.report r.source do
text.scan(r) || text.getch until text.eos?
end
text.reset
end
end
The results on Mac OS X, Intel MacBook, are:
RUBY_VERSION # "1.8.5"
# >> user system total real
# >> \w+\s 0.120000 0.000000 0.120000 ( 0.117398)
# >> \w+\s? 0.120000 0.000000 0.120000 ( 0.119532)
# >> \w+\s* 0.130000 0.000000 0.130000 ( 0.128141)
# >> \w+\s+ 0.120000 0.000000 0.120000 ( 0.126463)
RUBY_VERSION # "1.9.0"
# >> user system total real
# >> \w+\s 0.100000 0.000000 0.100000 ( 0.097058)
# >> \w+\s? 0.610000 0.950000 1.560000 ( 1.557211) (!)
# >> \w+\s* 0.620000 0.940000 1.560000 ( 1.561926) (!)
# >> \w+\s+ 0.610000 0.940000 1.550000 ( 1.557454) (!)
As you can see, the scan with the Regexps that have two quantifiers take
much longer than they should.
The problem vanishes if you deactivate Oniguruma's
USE_COMBINATION_EXPLOSION_CHECK flag (patch) in regint.h (see attachment):
+/* #define USE_COMBINATION_EXPLOSION_CHECK /* (X*)* */
RUBY_VERSION # "1.9.0"
# >> user system total real
# >> \w+\s 0.090000 0.000000 0.090000 ( 0.092095)
# >> \w+\s? 0.090000 0.000000 0.090000 ( 0.092578)
# >> \w+\s* 0.110000 0.000000 0.110000 ( 0.100903)
# >> \w+\s+ 0.090000 0.000000 0.090000 ( 0.096832)
String#scan doesn't show this behavior, only StringScanner seems to be
affected. I'm not sure where the real problem is, so I'm contacting the
ruby core list instead of Mr. Kosako.
The issue slows down my syntax highlighting library significantly, so it
can't profit from YARV speedups currently. So I hope we can fix this
before 1.9.1 comes out.
Thanks!
[murphy]
--------------040202000801080102020800
Content-Type: text/plain; x-mac-type="0"; x-mac-creator="0";
name
eactivate_combination_explosion_check.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename
eactivate_combination_explosion_check.diff"
Index: regint.h
--- regint.h (revision 12155)
+++ regint.h (working copy)
@@ -100,7 +100,7 @@
#include "ruby.h"
#include "rubysig.h" /* for DEFER_INTS, ENABLE_INTS */
-#define USE_COMBINATION_EXPLOSION_CHECK /* (X*)* */
+/* #define USE_COMBINATION_EXPLOSION_CHECK /* (X*)* */
#define USE_MULTI_THREAD_SYSTEM
#define THREAD_ATOMIC_START DEFER_INTS
--------------040202000801080102020800--