--- "Ara.T.Howard" <Ara.T.Howard / noaa.gov> wrote:

> On Sat, 3 Sep 2005, Robert Klemme wrote:
> 
> >> here's some things i think of:
> >
> > <lots of good stuff deleted>
> >
> >>    - anchor every single regular expression : performance
> degrades
> >>      exponentially otherwise
> >
> > That's not possible - especially with String#scan. 
> Sometimes you want all
> > occurrences of something.
> 
> true.  however there nearly always __some__ anchor one can
> use:
> 
>    harp:~ > irb
>    irb(main):001:0> "foo bar".scan %r/ \b\w+ /x
>    => ["foo", "bar"]
> 
> a better thing would be to say "use an anchor unless you know
> exactly why you
> shouldn't."

Regarding performance, just using any anchor won't necessarily
help performance.  The important ones are \A (beginning of
string) and \G (where the last match stopped).  You should be
able to make just about any regexp not using these use one of
these and match the same thing.  For example, this:

/<non-anchored regexp>/m

is equivalent to:

/\A.*?<non-anchored regexp>/m

I used "m" (multi-line) so that "." matched anything.  One
should realize that using a non-anchored (\A or \G) regexp has
the above equivalence (.*?) when matching and the associated
performance penalty.  Usually, you can simplify the .*? in your
application.

Regarding your scan case above, the \G anchored way to do it
would be:

"foo bar".scan(/\G\s*\w+/)

Of course this returns spaces in the result.  You could group
the \w+, but then you get another Array level which degrade
performance.

I think \A or \G anchoring is also better because you get
better syntax checking of your input - not ignoring what comes
before the thing you want.

It would be nice if there was some option in many of the regexp
matching methods to implicitly anchor your regexp.  About the
only ones that do this is what is in StringScanner.



		
____________________________________________________
Start your day with Yahoo! - make it your home page 
http://www.yahoo.com/r/hs