* Randy Kramer (Feb 27, 2005 16:00): > My impression is that some or many wikis (this is my impression of > TWiki) don't use a "real parser" (like YACC or whatever), but instead > simply search and replace in the text using (many) Regular > Expressions. Conceptually, that seems an easy approach (less learning > on my part, but probably tedious creation of many REs (or borrowing > from TWiki's Perl). Yes, this is a rather horrendous misuse of regular expressions. It works, but it's very brittle. As Robert Klemme said, the biggest problem isn't speed (although it is certainly an issue to factor in), rather that the interaction between the regular expressions may be non-obvious. Using a proper grammar you can be clear of how your data is being parsed. The place for regular expressions are not in describing grammars, but tokens. > ... I have a feeling that the TWiki markup language might not be > "regular" enough to be parsed by something like YACC ... Eh, "regular" is a weird word to use. Regular expressions map onto something called regular languages or regular sets. Regular languages are more constrained in what constructions may be used in the language than for a context-free language. Context-free languages are what are usually described by a parser-generator like YACC or RACC (the Ruby equivalent of the heavily C-bound YACC). Most markup languages (and perhaps most non-natural languages) are context-free in nature. Thus, using a parser-generator like RACC fits well with what you are trying to do. I couldn't say what kind of language the TWiki is, though, as I am not familiar with its syntax/grammar. > * If I did create the proper grammar rules, would parsing using > something like YACC be faster than a bunch of RE replacements? Yes. For any non-trivial problem, this would be the case. Note, however, that you still need to feed tokens to [YR]ACC, and this can be slow if done wrong. Also, it may be appropriate to write your own parser from scratch, if the grammar is simple enough. A recursive-decent parser is easier to implement and understand than a state-machine one (like those created by [YR]ACC). Of course, the fact that they create the state-machine for you is good, but it may still be troublesome to understand how the grammar interacts with the grammar-rules. This becomes obvious in a recursive-decent parser. > * Any recommendations for a parser in Ruby? I think there are a > couple, I've been doing some Googling / reading and have come > across references to parse.rb and (iirc) something called Coco > (??). Check out RACC, nikolai -- ::: name: Nikolai Weibull :: aliases: pcp / lone-star / aka ::: ::: born: Chicago, IL USA :: loc atm: Gothenburg, Sweden ::: ::: page: www.pcppopper.org :: fun atm: gf,lps,ruby,lisp,war3 ::: main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}