Thanks to all who replied so far.  I also want to look into the StringScanner 
approach (I'll reply separately with some questions about that), and I can't 
believe I couldn't find the ways to delete the first character of a string.  
Guess I am a newbie!

On Saturday 19 March 2005 08:04 am, Robert Klemme wrote:
> I'd bet that this approach is slower than a pure regexp based approach.  

So far, you're very right--my approach took about 30 times as long as the pure 
regexp approach, although my Ruby code might not be very efficient.  (In case 
nobody noticed, I'm very much a newbie to Ruby.)

> If 
> you cannot stuck all exact regexps into one (see below) then maybe some
> form of stripped regexps might help.  For example:

This sounds like its worth a try, but:
   1) I haven't created all the necessary REs yet
   2) Question below (for clarification)

> rx1 = /ab+/
> rx2 = /cd+/
>
> rx_all = /(ab+)|(cd+)/
>
> rx_stripped = /[ab](\w+)/

Question: IIUC, the [ab] above should be [ac]?

> # then, use these on the second part
> rx_stripped_1 = /^b+/
> rx_stripped_2 = /^d+/
>
> This is just a simple example for demonstration.  For these simple regexps
> rx_all is the most efficient one I'm sure.

> What does "fairly large" mean?  I would try to start with stucking *all*
> these regexps into one - if the rx engine does not choke on that regexp I'd
> assume that this is the most efficient way to do it, as then you have the
> best ratio of machine code to ruby interpretation.  Maybe you just show us
> all these regexps so we can better understand the problem.

It's hard even to guess, I intended to combine several REs into one anyway 
when they had a lot of commonality.  For example, the TWiki markup for 
headings (which I'm planning to use) is like this:

---* Level 1
---** Level 2
---*** Level 3
---**** Level 4
---***** Level 5
---****** Level 6

I've planned to use one RE for all the above, then determine the level from 
the length of the match (like level = len - 3).

Likewise, "inline" markup is *for bold*, _for italic,_ __for bold italic__, 
and so forth.  I'd try to have one RE looking for words preceded by _, *, or 
__, and another with words ending with the same.  (And might combine words 
marked with % for %TWikiVariables% as well.

With "optimizations" like this, I'd guess on the order of 15 or so regexps.

> Now I'm getting really curios.  Care to post some more details?

I presume you mean on the 1 to 10% savings?  I planned to do that, I'll try to 
put something on WikiLearn this weekend then post something here.

Randy Kramer