* Randy Kramer (Feb 27, 2005 16:00):
> My impression is that some or many wikis (this is my impression of
> TWiki) don't use a "real parser" (like YACC or whatever), but instead
> simply search and replace in the text using (many) Regular
> Expressions.  Conceptually, that seems an easy approach (less learning
> on my part, but probably tedious creation of many REs (or borrowing
> from TWiki's Perl).

Yes, this is a rather horrendous misuse of regular expressions.  It
works, but it's very brittle.  As Robert Klemme said, the biggest
problem isn't speed (although it is certainly an issue to factor in),
rather that the interaction between the regular expressions may be
non-obvious.  Using a proper grammar you can be clear of how your data
is being parsed.

The place for regular expressions are not in describing grammars, but
tokens.

> ... I have a feeling that the TWiki markup language might not be
> "regular" enough to be parsed by something like YACC ...

Eh, "regular" is a weird word to use.  Regular expressions map onto
something called regular languages or regular sets.  Regular languages
are more constrained in what constructions may be used in the language
than for a context-free language.  Context-free languages are what are
usually described by a parser-generator like YACC or RACC (the Ruby
equivalent of the heavily C-bound YACC).

Most markup languages (and perhaps most non-natural languages) are
context-free in nature.  Thus, using a parser-generator like RACC fits
well with what you are trying to do.  

I couldn't say what kind of language the TWiki is, though, as I am not
familiar with its syntax/grammar.

>    * If I did create the proper grammar rules, would parsing using
>    something like YACC be faster than a bunch of RE replacements?

Yes.  For any non-trivial problem, this would be the case.  Note,
however, that you still need to feed tokens to [YR]ACC, and this can be
slow if done wrong.  Also, it may be appropriate to write your own
parser from scratch, if the grammar is simple enough.  A
recursive-decent parser is easier to implement and understand than
a state-machine one (like those created by [YR]ACC).  Of course, the
fact that they create the state-machine for you is good, but it may
still be troublesome to understand how the grammar interacts with the
grammar-rules.  This becomes obvious in a recursive-decent parser.

>    * Any recommendations for a parser in Ruby?  I think there are a
>    couple, I've been doing some Googling / reading and have come
>    across references to parse.rb and (iirc) something called Coco
>    (??).

Check out RACC,
	nikolai

-- 
::: name: Nikolai Weibull    :: aliases: pcp / lone-star / aka :::
::: born: Chicago, IL USA    :: loc atm: Gothenburg, Sweden    :::
::: page: www.pcppopper.org  :: fun atm: gf,lps,ruby,lisp,war3 :::
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}