Simon Strandgaard wrote:
> On 12/27/06, gaurav bagga <gaurav.v.bagga / gmail.com> wrote:
> > was just going trough reg exp
> > the lines below just hangs irb and cpu usage goes 100% on windows xp
> > can anyone explain me why?
> >
> >
> > irb(main):001:0> r=/^(https?:\/\/)?[a-z0-9]+([\.\-\_=&\+\/\?]?[a-z0-9]+)+$/i
> > => /^(https?:\/\/)?[a-z0-9]+([\.\-\_=&\+\/\?]?[a-z0-9]+)+$/i
> > irb(main):002:0> "
> > http://groups-beta.google.com/group/rubyonrails-talk/browse_th
> > read/thread/8f085b191387d799/e78a71cbd7354c0c#e78a71cbd7354c0c"=~r
>
>
> the GNU regexp engine has a few oddities..
> here is another example that triggers an endless loop.
>
> r='<META http-equiv="Content-Type content="text/html; charset=iso-8859-1">'
> r.scan /<(?:[^">]+|"[^"]*")+>/
>
> we have in common that our regexp has nested repeating patterns.

Much faster:

r = %r{
   ^
   (https?://)?
   (?> [a-z0-9]+ )
   (?> [-._=&+/?]?  [a-z0-9]+  )+
   $
  }xi
p ( "http://groups-beta.google.com/group/rubyonrails-talk/" +
  "browse_thread/thread/8f085b191387d799/" +
  "e78a71cbd7354c0c#e78a71cbd7354c0c") =~ r

It doesn't match (because of the #).