S3 wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I am trying to write a simple script to parse
> an Apache log file, but it is taking an extremely
> long time.  I used profile and the problem appears
> to be with the regular expression matcher.
>
> I have made a simple script to run it with different
> lengths and it appears that the regular expression
> matcher is being very slow.  See the attached script.
>
> Here are some timings:
> ~>time ./mklong.rb 1000
> real    0m11.930s
> ~>time ./mklong.rb 2000
> real    0m50.400s
> ~>time ./mklong.rb 3000
> real    1m55.693s
> ~>time ./mklong.rb 4000
> real    3m16.004s
>
> So, dividing it out,
> 1000/11  = 91
> 2000/50  = 40
> 3000/120 = 25
> 4000/200 = 20
>
>
> So, the matching appears to be much slower than O(n).
> Isn't the whole point of regular expressions to be
> fast and O(n)?
> Whenever my script encounters a long string,
> it grinds to a halt.
>
> Why is this?
> Did I make the regular expression correctly?
> Is there some way to optimize it?
> Is there a problem with the matcher?
>
> Any help would be appreciated.
> <snip code>
>   
I'm not a regex expert, so I can't help you with optimizing yours, but 
if your pattern causes the regex engine to backtrack repeatedly, then it 
will not match in linear time. This usually becomes apparent on long 
strings where there can be lots of backtracking.

Sorry I can't offer more help than that.

-Justin