Ara.T.Howard wrote: > On Sat, 3 Sep 2005, Nikolai Weibull wrote: > > Ara.T.Howard wrote: > > > - anchor every single regular expression : performance degrades > > > exponentially otherwise > > That's an overstatement. True, use anchors whenever possible, as > > they are great hints to the engine and they certainly limit the > > number of possible matches very often, but it's not like every > > regular expression without anchors takes O(N^M), > i don't think it is an overstatement. i'm not saying every regex w/o > anchor __is__ O(N^M) - but that they easily __degrade__ to that worst > case more easily w/o them. a single misplaced '.*' can do this. one > time i was examining a processing machine that uses regexes to match > keys of items that arrive over the network and are inserted into a > giant in memory circular queue > (http://my.unidata.ucar.edu/content/software/ldm/archive/index.html) > because it's cpu usage had shot up from around 5 to 50% for no > apparent reason. the keys were not large (10-100 bytes) and the > regexes ldm is configured with tend to be pretty simple, things like > > /F10.*OLS/ /F12.*OLS/ > > and these are matched against strings like > > F10200008100242.OLS Well, what you said was that "performance degrades exponentially otherwise". I don't have your whole dataset, but it looks to me that you could be using a wild-card matcher instead. Another good solution would be to not use Ruby's built-in regular expression matcher, as it's implemented using backtracking, which, you are correct, has a tendency to go haywire far too often. That's the price you pay for having crap like backreferencing and look-around I'm afraid. (Sidenote: I'm releasing a regular expression matcher for Ruby in a near future that is quite nice.) Anyway, your advice is sound; the more information you give a backtracking NFA implementation, the better, nikolai -- Nikolai Weibull: now available free of charge at http://bitwi.se/! Born in Chicago, IL USA; currently residing in Gothenburg, Sweden. main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}