This is a multi-part message in MIME format.
--------------040202000801080102020800
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 7bit

The new Regexp engine still has an odd behavior which I think is a bug.
The following code tests four Regexps against the same long String:

  require "benchmark"
  require "strscan"

  text  test " * 100_000
  regexps  
    /\w+\s/,
    /\w+\s?/,
    /\w+\s*/,
    /\w+\s+/,
  ]

  Benchmark.bm(20) do |bm|
    text  tringScanner.new text
    for r in regexps
      bm.report r.source do
        text.scan(r) || text.getch until text.eos?
      end
      text.reset
    end
  end

The results on Mac OS X, Intel MacBook, are:

RUBY_VERSION  # "1.8.5"
# >>                 user     system      total        real
# >> \w+\s       0.120000   0.000000   0.120000 (  0.117398)
# >> \w+\s?      0.120000   0.000000   0.120000 (  0.119532)
# >> \w+\s*      0.130000   0.000000   0.130000 (  0.128141)
# >> \w+\s+      0.120000   0.000000   0.120000 (  0.126463)

RUBY_VERSION  # "1.9.0"
# >>                 user     system      total        real
# >> \w+\s       0.100000   0.000000   0.100000 (  0.097058)
# >> \w+\s?      0.610000   0.950000   1.560000 (  1.557211) (!)
# >> \w+\s*      0.620000   0.940000   1.560000 (  1.561926) (!)
# >> \w+\s+      0.610000   0.940000   1.550000 (  1.557454) (!)

As you can see, the scan with the Regexps that have two quantifiers take
much longer than they should.

The problem vanishes if you deactivate Oniguruma's
USE_COMBINATION_EXPLOSION_CHECK flag (patch) in regint.h (see attachment):

+/* #define USE_COMBINATION_EXPLOSION_CHECK        /* (X*)* */

RUBY_VERSION  # "1.9.0"
# >>                 user     system      total        real
# >> \w+\s       0.090000   0.000000   0.090000 (  0.092095)
# >> \w+\s?      0.090000   0.000000   0.090000 (  0.092578)
# >> \w+\s*      0.110000   0.000000   0.110000 (  0.100903)
# >> \w+\s+      0.090000   0.000000   0.090000 (  0.096832)

String#scan doesn't show this behavior, only StringScanner seems to be
affected. I'm not sure where the real problem is, so I'm contacting the
ruby core list instead of Mr. Kosako.

The issue slows down my syntax highlighting library significantly, so it
can't profit from YARV speedups currently. So I hope we can fix this
before 1.9.1 comes out.

Thanks!
[murphy]

--------------040202000801080102020800
Content-Type: text/plain; x-mac-type="0"; x-mac-creator="0";
 name
eactivate_combination_explosion_check.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename
eactivate_combination_explosion_check.diff"

Index: regint.h
--- regint.h	(revision 12155)
+++ regint.h	(working copy)
@@ -100,7 +100,7 @@
 #include "ruby.h"
 #include "rubysig.h"      /* for DEFER_INTS, ENABLE_INTS */
 
-#define USE_COMBINATION_EXPLOSION_CHECK        /* (X*)* */
+/* #define USE_COMBINATION_EXPLOSION_CHECK        /* (X*)* */
 #define USE_MULTI_THREAD_SYSTEM
 
 #define THREAD_ATOMIC_START          DEFER_INTS

--------------040202000801080102020800--