Hoi,

On Sun, 24 Mar 2002 07:36:20 +0900
"Max Maischein" <corion / informatik.uni-frankfurt.de> wrote:

> [...trying to parse HTML via REs...]
> > > a =~ /(<a.+?href=['"](.+?)['"].*?>)(.+?)(<\/a>)/m
> >
> > When Ruby's RE-Engine works a little bit like Perls, a RegEx like
> > /(<a.+?href=['"]([^"']+)['"][^>]*>)(.+?)(<\/a>)/
> >
> > Will be faster and more efficient.
> But also wrong, as it won't parse an url like this correctly :
> 
> Click here

That's right ;-) But attributes in ' are AFAIK forbidden. They've to be
in ". So we can correct it to

/(<a.+?href="([^"]+)"[^>]*>)(.+?)(<\/a>)/

> (and also some other weird stuff that I can think off). The best way
> IMO is to simply parse the HTML with an HTML parser

Yes, I think so, too. Parsing HTML with only a regular expressions seems
to be a little bit silly, IMHO ;-)

Greets,
 CK