> For example, the > following snippet is a perfectly well-formed and valid HTML document, > but none of the regexps posted in this thread so far are able to > correctly parse it: > > <HTML/ > <HEAD/ > <TITLE/>/ > <P/> > > Oh, and, no, there is nothing missing there (well, except for the > DOCTYPE declaration, I left that out for brevity -- this snippet is > valid HTML 2.0, HTML 3.2 and HTML 4.01), that is actually a complete, > well-formed and valid HTML document. > True, but most web sites are more likely to be malformed than they are to be unparsably complex. If a regex will work predictably for one type of page on one web site, perhaps a parser might be overkill. Dan