On 02/06/2010 07:57 PM, Ralf Mueller wrote:
> Michal Suchanek wrote:
>> Hello
>>
>> I tried scanning for multiple occurences of a group in a string and
>> match/scan would return only one.
>>
>>
>> "ajabcabck".match /^a*j(?:b*(a+)b+c*)+k$/
>> => #<MatchData "ajabcabck" 1:"a">
>>
>> "ajabcabck".scan /^a*j(?:b*(a+)b+c*)+k$/
>> => [["a"]]
>>
>>
>> clearly the a+ group must match twice to match the string from ^ to $
>> but only single match is returned.
>>
>> It is possible to use split instead but using a single match would be
>> much nicer.

I would only use #split if you really want to split the string. 
Otherwise please see below.

>> Any workaround?
>>
>> ruby 1.8.7 (2010-01-10 patchlevel 249) [i486-linux]

> as far as i know, nested groups are not allowed. regular expressions do 
> not form a language.

Nested groups *are* allowed.  However, one must understand how group 
matching works: for each matching group only at most *one* capture is 
recorded:

irb(main):001:0> s="abaab"
=> "abaab"
irb(main):002:0> /(?:(a+)b)+/.match s
=> #<MatchData "abaab" 1:"aa">
irb(main):003:0> md = /(?:(a+)b)+/.match s
=> #<MatchData "abaab" 1:"aa">
irb(main):004:0> md.to_a
=> ["abaab", "aa"]
irb(main):005:0> md[1]
=> "aa"
irb(main):006:0>

As you can see from this 1.9.1 test, it is the *last* match.  I cannot 
provide an official rationale for this, but one likely reason: The 
memory overhead for storing arbitrary amount of matches per group can be 
significant.  Also, the number of groups is known at compile time of a 
regular expression while the number of matches of each group is only 
known at match time.  This makes it easier to allocate the memory needed 
for storing a single capture per group because it can be done when the 
regular expression is compiled.  Please also note that all regular 
expression engines I know handle it that way, i.e. you get at most one 
capture per group.

In those cases I usually employ a two level approach:

irb(main):015:0> s = "ajabcaabck"
=> "ajabcaabck"
irb(main):016:0> if /^a*j((?:b*a+b+c*)+)k$/ =~ s
irb(main):017:1> $1.scan(/b*(a+)b+c*/){|m| p m, $1}
irb(main):018:1> end
["a"]
"a"
["aa"]
"aa"
=> "abcaabc"
irb(main):019:0>

Because of the way how #scan works we can do:

irb(main):022:0> if /^a*j((?:b*a+b+c*)+)k$/ =~ s
irb(main):023:1> $1.scan(/b*(a+)b+c*/){|m| p m}
irb(main):024:1> end
["a"]
["aa"]
=> "abcaabc"
irb(main):025:0>


Kind regards

	robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/