Simon Strandgaard wrote: > On 12/27/06, gaurav bagga <gaurav.v.bagga / gmail.com> wrote: > > was just going trough reg exp > > the lines below just hangs irb and cpu usage goes 100% on windows xp > > can anyone explain me why? > > > > > > irb(main):001:0> r=/^(https?:\/\/)?[a-z0-9]+([\.\-\_=&\+\/\?]?[a-z0-9]+)+$/i > > => /^(https?:\/\/)?[a-z0-9]+([\.\-\_=&\+\/\?]?[a-z0-9]+)+$/i > > irb(main):002:0> " > > http://groups-beta.google.com/group/rubyonrails-talk/browse_th > > read/thread/8f085b191387d799/e78a71cbd7354c0c#e78a71cbd7354c0c"=~r > > > the GNU regexp engine has a few oddities.. > here is another example that triggers an endless loop. > > r='<META http-equiv="Content-Type content="text/html; charset=iso-8859-1">' > r.scan /<(?:[^">]+|"[^"]*")+>/ > > we have in common that our regexp has nested repeating patterns. Much faster: r = %r{ ^ (https?://)? (?> [a-z0-9]+ ) (?> [-._=&+/?]? [a-z0-9]+ )+ $ }xi p ( "http://groups-beta.google.com/group/rubyonrails-talk/" + "browse_thread/thread/8f085b191387d799/" + "e78a71cbd7354c0c#e78a71cbd7354c0c") =~ r It doesn't match (because of the #).