Hoi, On Sun, 24 Mar 2002 07:36:20 +0900 "Max Maischein" <corion / informatik.uni-frankfurt.de> wrote: > [...trying to parse HTML via REs...] > > > a =~ /(<a.+?href=['"](.+?)['"].*?>)(.+?)(<\/a>)/m > > > > When Ruby's RE-Engine works a little bit like Perls, a RegEx like > > /(<a.+?href=['"]([^"']+)['"][^>]*>)(.+?)(<\/a>)/ > > > > Will be faster and more efficient. > But also wrong, as it won't parse an url like this correctly : > > Click here That's right ;-) But attributes in ' are AFAIK forbidden. They've to be in ". So we can correct it to /(<a.+?href="([^"]+)"[^>]*>)(.+?)(<\/a>)/ > (and also some other weird stuff that I can think off). The best way > IMO is to simply parse the HTML with an HTML parser Yes, I think so, too. Parsing HTML with only a regular expressions seems to be a little bit silly, IMHO ;-) Greets, CK