On Tue, 5 Sep 2006, Rick DeNatale wrote:

> So there are slight differences between using:
> PCR1 - A constant set to Regex('.')
> PCR2 - A constant set to Regex(/./)
> PCR3 A constant set to /./
> and
> LIT a literal in-line /./
>
> For this benchmark, for each string length tested, the techniques from
> fastest to slowest are;
> 130 Characters:
>  Rehearsal: PCR3(2.10), PCR1(2.21), PCR2(2.21), LIT(2.26)
>  "Live":       PCR2(2.34), PCR3(2.40), PCR1(2.53), LIT(2.60)
> 260 Characters:
>  Rehearsal: PCR2(2.18), PCR1(2.19), PCR3(2.25), LIT(2.25)
>  "Live":       PCR2(2.28), PCR3(2.40), PCR1(2.47), LIT(2.60)
> 520 Characters:
>   Rehearsal:  PCR3(2.14), PCR1(2.17), LIT(2.18), PCR2(2.19)
>   "Live":        PCR2(2.31), LIT(2.37), PCR1(2.49), PCR3(2.56)
> 1040 Characters:
>   Rehearsal: PCR1(2.11), PCR3(2.13), PCR2(2.19), LIT(2.28)
>  "Live":        PCR2(2.30),  LIT(2.33), PCR3(2.39), PCR1(2.48)
> 2080 Characters:
>   Rehearsal: PCR3(2.04), PCR2(2.14), LIT(2.19), PCR1(2.22)
>   "Live":       PCR3(2.36), LIT(2.37), PCR2(2.38), PCR1(2.52)
>
> So, at least from this benchmark it doesn't seem that in-line literal
> regular expressions are faster than pre-compiled ones.  In fact they
> never came in first, although PCR3 which was a constant set to a
> literal regex did win 4 times.

Those numbers seem funny, like there is a lot of background noise 
affecting them.

a = Regexp.new('.')
b = Regexp.new(/./)
c = /./

irb(main):015:0> a == b
=> true
irb(main):016:0> b == c
=> true
irb(main):017:0> a == c
=> true

They all produce equivalent Regexp objects.

Further, you are benchmarking scan, which isn't the same as comparing 
matching with regular expressions.

In a simple matching operation, an inline regexp, in the form of:

foo =~ /bar/

is the fastest way to match.

Ruby specially optimizes that style.  Eric Hodel explained it in a post 
from...some time this summer, I think.

bar = /bar/
bar = Regexp.new('bar')
bar = Regexp/new(/bar/)

are all equivalent, and

foo =~ bar

Will be slower.

Bar = /bar/
foo =~ Bar

will be slower yet.

Any variation that calls match() will be even slower yet, by a substantial 
margin.

So, for regular expression matching, the fastest, if not prettiest, 
approach is to use the '=~ /expression/' syntax.


Kirk Haines