Hi --

On Wed, 14 Sep 2005, James Edward Gray II wrote:

> I keep running into some surprising points with Ruby's Regexp engine today 
> and this first one just looks plain wrong to me:
>
> irb(main):001:0> html = "<p>one</p>\n\n<p>two</p>"
> => "<p>one</p>\n\n<p>two</p>"
> irb(main):002:0> html.sub!(/<p>(.*?)<\/p>(.*)/) { $1.strip }
> => "one\n\n<p>two</p>"
> irb(main):003:0> $2
> => ""
>
> Can anyone explain to me how that isn't a bug?
>
> Here's another surprise, for me:
>
> irb(main):001:0> html = "<p>one</p>\n\n<p>two</p>"
> => "<p>one</p>\n\n<p>two</p>"
> irb(main):002:0> html.sub!(/<p>(.*?)<\/p>(.*)\Z/) { $1.strip }
> => "<p>one</p>\n\ntwo"
>
> Using an anchor there means that the left-most match doesn't win?

In both cases, if you use the /m modifier, the dot will match \n, and
I think the behavior you want will happen.


David

-- 
David A. Black
dblack / wobblini.net