On Nov 22, 3:34 am, Raul Parolari <raulparol... / gmail.com> wrote:
> RichardOnRails wrote:
> > Hi Raul,
>
> > I like your "battle plan"..
> > I especially appreciate your showing me how a regex can be written to
> > handle an arbitrary number of dot-separated numbers (rather than hard-
> > code distinct sub-expressions).
>
> >>   if  line =~ /^ (.*?) [a-zA-Z] /x
>
> > I thought I could simply remove the question-mark.
> > So, your question mark is clearly working,  but HOW?
>
> Richard
>
> I saw that Gavin has given you (in another thread) a general tutorial on
> this. I add a simpler explanation just in the context of the problem we
> treated;
>
>   .*  means 'as many characters as possible'
>
> Now, the point 1 of the 'battle plan' was (I quote):
> "1) we first collect everything until the first letter (not included);
> we
> will consider this the Prefix."
>
> So we want to tell the Regexp Engine: "as few characters as possible
> until you see a letter (a-zA-Z), then stop right there!".
>
> Let's examine the 2 expressions, with and without the question mark:
>
>            (.*?)                      [a-zA-Z]
>  minimal nr of chars needed until ..  1st letter
>
>            (.*)                       [a-zA-Z]
>  as many chars you can get
>  possible get away with,  and then ..  a letter
>
> An example:
>
> s="2.1Topic 2.1"
>
> md = s.match( /^ (.*?) [a-zA-Z] /x )
> md[1]  # => "2.1"
>
> md = s.match( /^ (.*) [a-zA-Z] /x )
> md[1]  # => "2.1Topi"
>
> Have you seen? Both expressions were satisfied, but in different ways:
> a) the first (with .*?) tried to find the minimal number of characters
> until the first letter, and so it stopped when it found the 'T' of
> Topic.
>
> b) the second expression tried to find as many characters as possible,
> only bounded by having to then find a letter, so it stopped at the 'c'
> of Topic.
>
> With sense of humour, somebody observed that ".*? values contentment
> over greed"; and since then the ".*?" were called "not greedy", while
> the ".*" were called "greedy".
>
> [I stop here as Gavin described to you '.+" & co].
>
> One advice: the key to learn the regular expression is to read a good
> book (just trying them drives one insane) while experimenting (just
> reading drives one insane too). The time spent pays you back very
> quickly at the first serious exercise (as you can develop a
> 'battle-plan' rather than a 'guerrilla war' with the regexps).
>
> I am glad that you found the script useful, and I hope that this helped
> too
>
> Raul
> --
> Posted viahttp://www.ruby-forum.com/.

Hi Raul,

I forgot to tell you that I finally understand your second example.

> md = s.match( /^ (.*) [a-zA-Z] /x )
> md[1]  # => "2.1Topi"

Without the question mark, in principal, the ".* initially consumes
all the characters,  but then it sees the match fails, because there's
no match for the "[a-zA-Z]".  So the ".*" sort of "backs off" and
satisfies it self with "2.1Topi", leaving the "c" to satisfy "[a-zA-
Z]".

Cool.  Actually,  I read that in "Mastering Regular Expressions, vol.
2",  but it really didn't settle into my WeltAnshaung.  But I think I
got it now!

Furthermore, the "non-greedy question mark" says "consume only as much
as you need in order to satisfy the total RE.  So "(.*?) needs to
consumed all the caracters up to something satisfying the "[a-zA-Z]",
which is the "T"

The one I like settled on is:

s="2.1Topic 2.1"
md = s.match( /^ ([\.\d]*) [^\.\d] /x )
  #md[0]=2.1T
  #md[1]=2.1