Garance A Drosehn wrote:

> First:  He wants a single regex which will verify the syntax of an
> entire line.  So, first he wants a true/false value, saying "The line
> is valid, or it is not valid".  Never mind any values in the line, just
> "is the line *completely valid*?".
>
> Then, if the line is valid, he wants to break out individual pieces
> of what was scanned, and he wants to do that without re-doing
> any of the scans he did in the first regex.  The trick is that some
> of those pieces are a repeating group, such as /(\s\w)*/.
>
> What is confusing us is that he describes this using a simple
> example, and when we solve the simple example he then says
> "you don't get the bigger picture!".  Ugh.
>
> Let me give an example, and see if someone can solve it.  My
> example might still be something other than what he's thinking
> of, but maybe it will help.
>
> Let's say I'm expecting command lines of the form:
>    first word is either 'copy' or 'duplicate'
>    followed by one or more words
>    followed by the word 'before' or 'after'
>    followed by one or more words
>
> So I could do the first step with the regexp:
>
>   /^(copy|duplicate) \s+ (\w+\s+)+ (before|after) \s+ (\w+\s*)+ $/x
>
>  (hopefully I've done that right!).  *IF* that matches, then I know
> the entire line is valid.  Then, after I know the line is valid, I want
> the array of source-words, and the array of destination-words
> which were matched.  I want to do that by picking out information
> in Matchdata, not by doing a new scan.  The thing is, I don't think
> I have a way of knowing how many times the first '(\d+\s+)+' was
> matched.  So I can't just do a slice of $~.captures because I don't
> know what the starting and ending indexes of that slice would be.
> I could put another set of parenthesis around the two repeating
> groups:
>
>   /^(copy|duplicate) \s+ ((\w+\s+)+) (before|after) \s+ ((\w+\s*)+) $/x
>
> But that doesn't really give me two separate arrays of the
> individual values that made up each group.  It just matches
> each group as a whole.
>
> Given two data lines of:
>     copy apple pear plum peach after bill bob
>     duplicate tomato before joe alice alfred tommy jane
>
> in the first case I want a way to set two arrays:
>     srcfood = ["apple ", "pear ", "plum ", "peach "]
>     destword = ["bill ", "bob"]
> from the first line, and
>     srcfood = ["tomato "]
>     destword = ["joe ", "alice", "alfred ", "tommy ", "jane"]
> from the second line.
>
> I'll agree this is a weird example, but I think it shows the issue.
> If I apply the above pattern to the first line, I'll see a Matchdata
> result where:
>
> $~.captures ==
>    ["copy", "apple pear plum peach ", "peach ", "after", "bill bob", "bob"]

DATA.each {|line|   line.chomp!
  md =
    /^(?:copy|duplicate) \s+
      ((?:\w+\s+)+)
      (?:after|before) \s+
      ((?:\w+\s*)+) $
    /x.match( line )
  p md.captures
  src_food = md.captures.first.split
  dest_word = md.captures.last.split
  p src_food, dest_word
}

__END__
copy apple pear plum peach after bill bob
duplicate tomato before joe alice alfred tommy jane

----- output: -----

["apple pear plum peach ", "bill bob"]
["apple", "pear", "plum", "peach"]
["bill", "bob"]
["tomato ", "joe alice alfred tommy jane"]
["tomato"]
["joe", "alice", "alfred", "tommy", "jane"]