Thanks to everyone who responded on this thread, and especially to Aredridel--the examples here helped me understand at least two varieties of parsers considerably better than I did. There is a folloup for Aredridel below. I think I'm heading in the direction of writing something that might be called a parser (or might not be). My basic aim is to minimize the number of passes through the document. (Simple RE substitution requires a pass through the document for each RE (IIUC).) I think I can do it with one or two passes, with a rather elaborate case type statement (or many ifs) on examination of each "token" (word). I've written some other questions to the list. I'm going through TWiki and trying to develop a list of all the TWiki markup--there is a lot I don't use (so far) and I may initially deal only with the more common stuff. (In the back of my mind I am still reserving some last resort options--I can always install an instance of TWiki locally, and "rig" a way to feed markup to it and recover the HTML.) On Tuesday 01 March 2005 01:57 am, Aredridel wrote: > The funky cases come in things not easily tokenized: lists, > particularly, are a pain. My current parser does ugly things like > guess whether something is list-like, hands the whole blob to a > listifier routine, which then throws out any plaintext bits for > rendering. Ugly, but works. How are your lists marked up? (Or are you trying to recognize them without a specific markup, more of a natural language type thing?) thanks again to all, Randy Kramer