On 12.04.2007 18:31, MenTaLguY wrote: > On Thu, 12 Apr 2007 16:10:06 +0900, Robert Klemme <shortcutter / googlemail.com> wrote: >>> Every NFA can be rewritten as a DFA. Each state in the resulting DFA >>> corresponds to a superposition of states in the original NFA >> While this is true in theory there is a number of modern RX features >> that are extremely difficult or impossible to do with a DFA. So as it >> stands now for pratical purposes most modern RX engines are NFA or >> hybrids, > > It's important not to confuse NFAs with backtracking. For instance, Thompson's algorithm uses NFAs, but it evaluates by rewriting them on-the-fly as DFAs, not by performing a backtracking search. Again, NFAs and DFAs are equivalent, even if the naive algorithm (backtracking) for evaluating an NFA is less efficient than the naive algorithm for evaluating a DFA. > > On the other hand, those modern features which really do require backtracking (surprisingly few) also exceed the capabilities of finite automata (i.e. those features correspond to non-regular languages), so at that point the regular expression engines cannot be said to be using FAs at all, but a more general type of automaton. Strinctly speaking yes. But AFAIK the basis is usually a NFA. Btw, because of the nature of the engine there are some more differences between DFA and NFA, for example typically order of expressions in an alternative matters with a NFA engine. I prefer that often, because it allows to control the way the RX engine matches to a certain extent, i.e. you can prioritize alternatives which can be pretty handy at times. > Again, though we must obviously keep backtracking in reserve for those patterns that need it, it is very wasteful not to use a pure FA approach for those patterns where non-regular features aren't being used. > >> sometimes (Perl) with heavy optimizations to prevent certain >> ugly effects (see "Mastering Regular Expressions" for details). > > "Mastering Regular Expressions" is very good for learning the pragmatics of working with contemporary regular expression implementations, but it is not a good source for theory. Since I had the theory at university and at the moment I'm more interested in using them than writing them this is good enough for me. :-) Kind regards robert