Garance A Drosehn wrote: > First: He wants a single regex which will verify the syntax of an > entire line. So, first he wants a true/false value, saying "The line > is valid, or it is not valid". Never mind any values in the line, just > "is the line *completely valid*?". > > Then, if the line is valid, he wants to break out individual pieces > of what was scanned, and he wants to do that without re-doing > any of the scans he did in the first regex. The trick is that some > of those pieces are a repeating group, such as /(\s\w)*/. > > What is confusing us is that he describes this using a simple > example, and when we solve the simple example he then says > "you don't get the bigger picture!". Ugh. > > Let me give an example, and see if someone can solve it. My > example might still be something other than what he's thinking > of, but maybe it will help. > > Let's say I'm expecting command lines of the form: > first word is either 'copy' or 'duplicate' > followed by one or more words > followed by the word 'before' or 'after' > followed by one or more words > > So I could do the first step with the regexp: > > /^(copy|duplicate) \s+ (\w+\s+)+ (before|after) \s+ (\w+\s*)+ $/x > > (hopefully I've done that right!). *IF* that matches, then I know > the entire line is valid. Then, after I know the line is valid, I want > the array of source-words, and the array of destination-words > which were matched. I want to do that by picking out information > in Matchdata, not by doing a new scan. The thing is, I don't think > I have a way of knowing how many times the first '(\d+\s+)+' was > matched. So I can't just do a slice of $~.captures because I don't > know what the starting and ending indexes of that slice would be. > I could put another set of parenthesis around the two repeating > groups: > > /^(copy|duplicate) \s+ ((\w+\s+)+) (before|after) \s+ ((\w+\s*)+) $/x > > But that doesn't really give me two separate arrays of the > individual values that made up each group. It just matches > each group as a whole. > > Given two data lines of: > copy apple pear plum peach after bill bob > duplicate tomato before joe alice alfred tommy jane > > in the first case I want a way to set two arrays: > srcfood = ["apple ", "pear ", "plum ", "peach "] > destword = ["bill ", "bob"] > from the first line, and > srcfood = ["tomato "] > destword = ["joe ", "alice", "alfred ", "tommy ", "jane"] > from the second line. > > I'll agree this is a weird example, but I think it shows the issue. > If I apply the above pattern to the first line, I'll see a Matchdata > result where: > > $~.captures == > ["copy", "apple pear plum peach ", "peach ", "after", "bill bob", "bob"] DATA.each {|line| line.chomp! md = /^(?:copy|duplicate) \s+ ((?:\w+\s+)+) (?:after|before) \s+ ((?:\w+\s*)+) $ /x.match( line ) p md.captures src_food = md.captures.first.split dest_word = md.captures.last.split p src_food, dest_word } __END__ copy apple pear plum peach after bill bob duplicate tomato before joe alice alfred tommy jane ----- output: ----- ["apple pear plum peach ", "bill bob"] ["apple", "pear", "plum", "peach"] ["bill", "bob"] ["tomato ", "joe alice alfred tommy jane"] ["tomato"] ["joe", "alice", "alfred", "tommy", "jane"]