Stefan Scholl <stesch / no-spoon.de> wrote in message news:<87sn6uw54n.fsf / parsec.no-spoon.de>...
> OK. No endless loop. It's just very, very slow.
> 
> My program needs about 45 minutes (REXML 2.0) instead of 1 3/4 minutes
> (REXML 1.2.7).

Yup.  You can thank, by and large, XPath itself for this.

REXML 1.2.7 had incomplete XPath support.  As XPath support in REXML
has gotten more complete, it has gotten increasingly slower.  XPath
doesn't lend itself to optimization; for instance, it requires some
path evaluations that remove option of building up a large array of
possible nodes and then filtering that array.  Instead, in these cases
the XPath must be evaluated independantly on each node.

There are some things I /might/ be able to do; the 2.0 XPath is, after
all, a virgin rewrite.  However, I don't expect that you'll see an
order of magnitude speed increase.

Your only option is to use the old XPath from 1.2.7.  I'll include it
with the 2.1 release.  Since the API hasn't changed, you should be
able to drop in right in.  What I did was copy the 1.2.7 xpath.rb to
quickpath.rb, and then replaced all occurrances of "XPath" (case
sensitive) with "QuickPath" in the file.  Then you can choose to use
either complete, correct, and slow XPath, or quick and dirty XPath. 
In many cases, quick and dirty is accurate; the difference in
completeness mostly has to do with predicates, so if your predicates
aren't complicated, you are probably safe with quick and dirty.

I'll try to find a better solution, and believe me... I'll work on
optimization.  The only thing I can tell you is that the current XPath
implementation is a lot better than had I used a real lexer-generated
parser, since I'm using loops rather than recursion in a lot of
places.