RM> I am trying to extract the href from links in HTML, however the
RM> regular expression matcher doesn't appear to stop in the correct
RM> place.

RM> I intend the regular expression to extrace the href that is
RM> enclosed in quotes and return that into $1. However is seems to
RM> 'miss' the first set of quotes and a following one and finally
RM> stop on a third set.

RM> Here is an example

RM> s="<A  href=\"l.htm\">xxx <IMG  src=\"images/la.gif\" width=4></A>"

RM> s =~ /<.*A.*href *= *"(.*)".*>/   => 0

$1   =>>  "l.htm\">xxx <IMG  src=\"images/la.gif"


s = '<A  href="l.htm">xxx <IMG  src="images/la.gif" width=4></A>'
s =~ /<a\s+href\s*=\s*\"*([^>\"\s]+).*/i
p $1

the above gets everything between the href= and the > (excluding
the ending space or quote if there).  Is that what you were looking
for?

regards,
-joe