> -----Original Message----- > From: James Edward Gray II [mailto:james / grayproductions.net] > Sent: Tuesday, September 13, 2005 12:31 PM > To: ruby-talk ML > Subject: Surprising Regexp Behavior > > > I keep running into some surprising points with Ruby's Regexp engine > today and this first one just looks plain wrong to me: > > irb(main):001:0> html = "<p>one</p>\n\n<p>two</p>" > => "<p>one</p>\n\n<p>two</p>" > irb(main):002:0> html.sub!(/<p>(.*?)<\/p>(.*)/) { $1.strip } > => "one\n\n<p>two</p>" > irb(main):003:0> $2 > => "" > > Can anyone explain to me how that isn't a bug? What's the bug to you? The fact that the second set of <p></p> wasn't stripped or the fact that $2 is empty? In the former, sub != gsub. In the latter, you need multi-line mode because of the "\n\n": # Without /m irb(main):026:0> html =~ /<p>(.*?)<\/p>(.*)/ => 0 irb(main):027:0> $1 => "one" irb(main):028:0> $2 => "" # With /m irb(main):023:0> html =~ /<p>(.*?)<\/p>(.*)/m => 0 irb(main):024:0> $1 => "one" irb(main):025:0> $2 => "\n\n<p>two</p>" Regards, Dan