Robert Klemme wrote:
> If you provide more detail about the input and the text that you want
> to match we might be able to help fix the regular expression.  IMHO
> that approach is superior to simply returning the match effectively
> replacing it with itself (which does work of course).

self.html.gsub!(/<a\s+?[^>]*?href=(['"])   #<a up to and including 
href=' or href="
                (?!mailto:)(.*?)            #Contents of any non-mailto: 
href attribute
                \1.*?>                      #End of href attribute (same 
quote) + arbitrary text to end of opening <a> tag
                (.*?)                       #Contents of <a> - the "link 
display"
                 <\\?\/a>/mix) {            #Closing </a> tag, allowing 
for optional \, e.g. </a> or <\/a>

So, this regex is attempting to pull out the contents of an href in a 
<a> tag, as well as the content enclosed by the <a> tag.

The problem comes when it encounters a particularly nefarious kind of 
HTML which looks like this:

<a href="x"><div>....<a href="x"></a>....</div>

and there is no closing </a> for the first anchor.  What I want to pull 
is the _valid_ <a> tag "on the inside", but what I get is the first <a> 
tag up to the closing </a> tag, which is not correct.  The problem is 
that the first <a> tag just shouldn't be there at all.

So I need to modify my regex to not match if there is a <a> tag inside 
of another one.  I tried for about 30 minutes yesterday using a (?!) 
assertion, but couldn't quite get it.

Thanks,
Wes

-- 
Posted via http://www.ruby-forum.com/.