Thanks to everyone who responded on this thread, and especially to 
Aredridel--the examples here helped me understand at least two varieties of 
parsers considerably better than I did.

There is a folloup for Aredridel below.

I think I'm heading in the direction of writing something that might be called 
a parser (or might not be).  My basic aim is to minimize the number of passes 
through the document.  (Simple RE substitution requires a pass through the 
document for each RE (IIUC).)  I think I can do it with one or two passes, 
with a rather elaborate case type statement (or many ifs) on examination of 
each "token" (word).

I've written some other questions to the list.  I'm going through TWiki and 
trying to develop a list of all the TWiki markup--there is a lot I don't use 
(so far) and I may initially deal only with the more common stuff.  

(In the back of my mind I am still reserving some last resort options--I can 
always install an instance of TWiki locally, and "rig" a way to feed markup 
to it and recover the HTML.)


On Tuesday 01 March 2005 01:57 am, Aredridel wrote:
> The funky cases come in things not easily tokenized: lists,
> particularly, are a pain. My current parser does ugly things like
> guess whether something is list-like, hands the whole blob to a
> listifier routine, which then throws out any plaintext bits for
> rendering. Ugly, but works.

How are your lists marked up?  (Or are you trying to recognize them without a 
specific markup, more of a natural language type thing?)

thanks again to all,
Randy Kramer