Austin Ziegler <halostatue / gmail.com> writes:
> On 09/01/06, Eric Schwartz <emschwar / mail.ericschwartz.us> wrote:
> > More like, "Just pray the HTML you are modifying doesn't happen to be
> > completely valid, but not formed in exactly the way you are
> > expecting."  For instance, the following HTML snippet is completely
> > valid, but screws up the regex:
> >
> > <p>a <img src="greaterthan.gif" alt=">" /> b</p>
> 
> Actually, that is *not* completely valid, at least not valid XHTML
> (which is what I use these days).

When wrapped with the appropriate tags, it validated HTML 4.01, which
is what I recommend most people generate these days (because of some,
but not all, of the reasons elucidated at
http://codinginparadise.org/weblog/2005/08/xhtml-considered-harmful.html).
So yes, it is valid HTML, which is all I claimed it to be.

I specifically didn't mention XHTML, since the bits of the thread I
saw referenced HTML, and they're enough different I figured XHTML
would have been mentioned if that's what was wanted.  Of course with
XHTML, you have CDATA sections, which can contain all sorts of
nastiness that can trip you up just as badly.

> You have to do that as:
>   <p>a <img src="greaterthan.gif" alt="&gt;" /> b</p>
> 
> But my regexp wasn't intended to be complete; there are full libraries
> out there for that.

Right; my point was that in my experience, regexes seem to work just
fine, until suddenly they don't, and then you have to spend silly
amounts of time compensating for them-- or you could just use a proper
library in the first place, and not have to worry about it.

-=Eric