Ron Jeffries <ronjeffries / REMOVEacm.org> writes:

> B) I didn't see that .*? was enough to make the whole regexp stop at the
> first match of some other part of the pattern. 

Ron:

The effect of '?' isn't global, it just affects the immediately
previous '*' or '+'.

Consider

    %r{<a>.*</a>}

When matching <a>aaa</a> bbb <a>ccc</a>

The '.*' part is greedy, and so logically it consumes all possible
characters. When it gets to the end of the string, it then says "ok,
now match '</a>', and fails (because it's at the end of the
string). So it backtracks one position, tries again, fails,
backtracks, fails, backtracks, fails, backtracks, and succeeds. The .*
matched 'aaa</a> bbb <a>ccc'.

When you say .*?, you're changing the semantics, so that it does the
reverse. It the .*? first consumes no characters, and then says "am I
now looking at '</a>'. Nope, so it consumes one, asks again. Still
fails. Consumes another, fails. Finally, just when you or I would have
given up and gone for a drink, in consumes another and matches. 'aaa'
is matched, and Ron is happy.

[You can cut this out and paste it to your book, making it a special
Jeffries edition pickaxe, which one day might be terribly, terribly
valuable].


I'd seriously suggest getting Friedl if you're interested: it's a fine
book.


Dave