On 02/06/2010 07:57 PM, Ralf Mueller wrote: > Michal Suchanek wrote: >> Hello >> >> I tried scanning for multiple occurences of a group in a string and >> match/scan would return only one. >> >> >> "ajabcabck".match /^a*j(?:b*(a+)b+c*)+k$/ >> => #<MatchData "ajabcabck" 1:"a"> >> >> "ajabcabck".scan /^a*j(?:b*(a+)b+c*)+k$/ >> => [["a"]] >> >> >> clearly the a+ group must match twice to match the string from ^ to $ >> but only single match is returned. >> >> It is possible to use split instead but using a single match would be >> much nicer. I would only use #split if you really want to split the string. Otherwise please see below. >> Any workaround? >> >> ruby 1.8.7 (2010-01-10 patchlevel 249) [i486-linux] > as far as i know, nested groups are not allowed. regular expressions do > not form a language. Nested groups *are* allowed. However, one must understand how group matching works: for each matching group only at most *one* capture is recorded: irb(main):001:0> s="abaab" => "abaab" irb(main):002:0> /(?:(a+)b)+/.match s => #<MatchData "abaab" 1:"aa"> irb(main):003:0> md = /(?:(a+)b)+/.match s => #<MatchData "abaab" 1:"aa"> irb(main):004:0> md.to_a => ["abaab", "aa"] irb(main):005:0> md[1] => "aa" irb(main):006:0> As you can see from this 1.9.1 test, it is the *last* match. I cannot provide an official rationale for this, but one likely reason: The memory overhead for storing arbitrary amount of matches per group can be significant. Also, the number of groups is known at compile time of a regular expression while the number of matches of each group is only known at match time. This makes it easier to allocate the memory needed for storing a single capture per group because it can be done when the regular expression is compiled. Please also note that all regular expression engines I know handle it that way, i.e. you get at most one capture per group. In those cases I usually employ a two level approach: irb(main):015:0> s = "ajabcaabck" => "ajabcaabck" irb(main):016:0> if /^a*j((?:b*a+b+c*)+)k$/ =~ s irb(main):017:1> $1.scan(/b*(a+)b+c*/){|m| p m, $1} irb(main):018:1> end ["a"] "a" ["aa"] "aa" => "abcaabc" irb(main):019:0> Because of the way how #scan works we can do: irb(main):022:0> if /^a*j((?:b*a+b+c*)+)k$/ =~ s irb(main):023:1> $1.scan(/b*(a+)b+c*/){|m| p m} irb(main):024:1> end ["a"] ["aa"] => "abcaabc" irb(main):025:0> Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/