"Randy Kramer" <rhkramer / gmail.com> schrieb im Newsbeitrag 
news:200503181244.16009.rhkramer / gmail.com...
> This is going to seem a little strange (for a number of reasons I might
> mention below), but I would like to iterate through a string from 
> beginning
> to end, examining each character, then discarding (or moving) them
> (elsewhere).
>
> This is so that I can, upon finding certain character(s), run one of 
> several
> REs on the next part of the string.  I'm looking to find a very efficient 
> way
> to do this.
>
> Here are the approaches I'm considering so far.  Any other suggestions?
>
>   * ?? (Is there any method which chops the first character from the 
> string?
> How efficient is that (especially compared to the method described in the
> next bullet).
>   * Reverse the string, then use something like s[length (-1)] to examine
> then chop to discard the last character.  The problem with this approach 
> is
> that I then have to reverse the string again to have the REs work.  (I
> (briefly) thought about just setting up the REs in reverse, but I foresee 
> a
> lot of difficulty there--scanning through the string in reverse (from last
> character to first, negates the advantage I was hoping to gain, that of
> starting the RE match only from positions where the RE could possibly 
> match.)
>
> Barring anything better, the approach I may take is to iterate through the
> string and, when I find a potential RE match, use s[n,len] to return a
> partial string to be checked against the RE.

I'd bet that this approach is slower than a pure regexp based approach.  If 
you cannot stuck all exact regexps into one (see below) then maybe some form 
of stripped regexps might help.  For example:

rx1 = /ab+/
rx2 = /cd+/

rx_all = /(ab+)|(cd+)/

rx_stripped = /[ab](\w+)/
# then, use these on the second part
rx_stripped_1 = /^b+/
rx_stripped_2 = /^d+/

This is just a simple example for demonstration.  For these simple regexps 
rx_all is the most efficient one I'm sure.

> Aside: I'm trying to make a very efficient parser in Ruby for a wiki-like
> thing I want to build, and am trying to avoid the approach of simply 
> making
> multiple scans through the document for each of a fairly large number of 
> REs.

What does "fairly large" mean?  I would try to start with stucking *all* 
these regexps into one - if the rx engine does not choke on that regexp I'd 
assume that this is the most efficient way to do it, as then you have the 
best ratio of machine code to ruby interpretation.  Maybe you just show us 
all these regexps so we can better understand the problem.

> I might be guilty of premature optimization, but I prefer to think of it 
> as
> doing some proof of concept testing before committing to a design.
>
> I have done some tests that show a 1 to 10% savings in time by taking a
> similar approach for REs that could only match at the beginning of a 
> string.
> (At some point I'll "publish" those results on WikiLearn (or the next
> incarnation thereof).)  The next REs are considerably more complex as they
> can match anywhere in the string--if the savings from the same approach 
> for
> them is only 1 to 10%, the complexity will not be worth it.  If by some
> chance it exceeds say 50%, I will seriously consider that complexity.

Now I'm getting really curios.  Care to post some more details?

Kind regards

    robert