I want to do something similar to what I've seen on slashdot, where the
user is allowed to enter HTML for responses to articles, but the set of
tags that are acceptable is only a subset of full  HTML.

Now, rather than trying to do something like edit the string to remove
any matches of UNacceptable tags, I'd prefer to be able to define which
ones ARE acceptable and then edit the string to remove anything else
that looks like a tag but doesn't match the ones I've decided are OK.

My logic here is that I can easily define the list of those I'm happy to
have, but that would be a relatively small subset of the entire range of
tags.

So, can anyone suggest the right way to approach this?

As I say, if I was going to remove the ones I didn't accept, that would
be easy, because I'd just do a bunch of gsub()s (or maybe even a single
more complex gsub()) on the string.  As far as I can see, the only easy
way to do it the other way around is to write some kind of small lexical
analyser that, each time you call it returns either a tag or some
intermediate  text.  I can then build up the edited text by concatenating
the returned "tokens", eliminating those I don't want.as I go.

However, this sounds a little messy to me, unless there's already
some class that can assist in doing it.

TIA