On Nov 22, 3:34 am, Raul Parolari <raulparol... / gmail.com> wrote: > RichardOnRails wrote: > > Hi Raul, > > > I like your "battle plan".. > > I especially appreciate your showing me how a regex can be written to > > handle an arbitrary number of dot-separated numbers (rather than hard- > > code distinct sub-expressions). > > >> if line =~ /^ (.*?) [a-zA-Z] /x > > > I thought I could simply remove the question-mark. > > So, your question mark is clearly working, but HOW? > > Richard > > I saw that Gavin has given you (in another thread) a general tutorial on > this. I add a simpler explanation just in the context of the problem we > treated; > > .* means 'as many characters as possible' > > Now, the point 1 of the 'battle plan' was (I quote): > "1) we first collect everything until the first letter (not included); > we > will consider this the Prefix." > > So we want to tell the Regexp Engine: "as few characters as possible > until you see a letter (a-zA-Z), then stop right there!". > > Let's examine the 2 expressions, with and without the question mark: > > (.*?) [a-zA-Z] > minimal nr of chars needed until .. 1st letter > > (.*) [a-zA-Z] > as many chars you can get > possible get away with, and then .. a letter > > An example: > > s="2.1Topic 2.1" > > md = s.match( /^ (.*?) [a-zA-Z] /x ) > md[1] # => "2.1" > > md = s.match( /^ (.*) [a-zA-Z] /x ) > md[1] # => "2.1Topi" > > Have you seen? Both expressions were satisfied, but in different ways: > a) the first (with .*?) tried to find the minimal number of characters > until the first letter, and so it stopped when it found the 'T' of > Topic. > > b) the second expression tried to find as many characters as possible, > only bounded by having to then find a letter, so it stopped at the 'c' > of Topic. > > With sense of humour, somebody observed that ".*? values contentment > over greed"; and since then the ".*?" were called "not greedy", while > the ".*" were called "greedy". > > [I stop here as Gavin described to you '.+" & co]. > > One advice: the key to learn the regular expression is to read a good > book (just trying them drives one insane) while experimenting (just > reading drives one insane too). The time spent pays you back very > quickly at the first serious exercise (as you can develop a > 'battle-plan' rather than a 'guerrilla war' with the regexps). > > I am glad that you found the script useful, and I hope that this helped > too > > Raul > -- > Posted viahttp://www.ruby-forum.com/. Hi Raul, I forgot to tell you that I finally understand your second example. > md = s.match( /^ (.*) [a-zA-Z] /x ) > md[1] # => "2.1Topi" Without the question mark, in principal, the ".* initially consumes all the characters, but then it sees the match fails, because there's no match for the "[a-zA-Z]". So the ".*" sort of "backs off" and satisfies it self with "2.1Topi", leaving the "c" to satisfy "[a-zA- Z]". Cool. Actually, I read that in "Mastering Regular Expressions, vol. 2", but it really didn't settle into my WeltAnshaung. But I think I got it now! Furthermore, the "non-greedy question mark" says "consume only as much as you need in order to satisfy the total RE. So "(.*?) needs to consumed all the caracters up to something satisfying the "[a-zA-Z]", which is the "T" The one I like settled on is: s="2.1Topic 2.1" md = s.match( /^ ([\.\d]*) [^\.\d] /x ) #md[0]=2.1T #md[1]=2.1