Wes Gamble wrote:
> All,
> 
> I am attempting to do some matching on some HTML.
> 
> Here's what I want:
> 
> I want to be able to match any <area> tag which is of the form <area 
> ...> that does NOT contain a "mailto:" href.  The grouping is for some 
> substitution that I'm doing.
> 
> Here's my pattern:
> 
> /(<area .*?href=['|"])(?!mailto:)(.*?)(['|"].*?>)/mi
> 
> What I find is that this pattern will successfully handle most area 
> tags, however, when an area tag is followed by a <a> tag (which also has 
> a href attribute, it will match everything between <area and the end of 
> the <a> tag).  So, for example,
> 
> <area href="mailto: xyz / abc.com>other stuff, including tags<a 
> href="blah">
> 
> this pattern matches all the way through the end of the <a> tag.
> 
> I tried to stick a negative lookahead (?!<) at the end of the pattern 
> but that doesn't seem to help.
> 
> How do I get this pattern to STOP matching?
> 
> Thanks for any help,
> Wes


This appears to work better:

/(<area [^>]*?)(href=['|"])(?!mailto:)(.*?)(['|"].*?>)/mi

The "any character but '>'" seems to stop the evaluation of the match 
from making it any further than the end of the <area> tag.

Wes

-- 
Posted via http://www.ruby-forum.com/.