Hi Peter,

This is a good idea... I wasn't clear in my original post but the
problem is that some of the lines have 3 (\d+), some 4 and some 5.
Also, there are 4 different groups of data sprinkled through a load of
log files.

Another way of slimming down the regex horror might be to use a bunch
of mini regexes and then using "recipes".

So, a new method for the Regexp class (shamelessly plagiarized from this group)

 class Regexp
     def +(other)
        if other.is_a?(Regexp)
           if self.options == other.options
                Regexp.new(source + other.source, options)
           else
                Regexp.new(source + other.to_s, options)
           end
        else
           Regexp.new(source + Regexp.escape(other.to_s), options)
        end
     end
  end


r1 = %r{XD\s\*\s}
r2 = %r{(\d)\s(\d+)\s(\d+)\s(\d+)\s(\d+)\s(\d+)\n}mx
r3 = %r{(\d)\s(\d+)\s(\d+)\s(\d+)\s(\d+)\n}mx
r4 = %r{(\d)\s(\d+)\s(\d+)\s(\d+)\n}mx

recipe1 = r1 + r2 + r2 + r3 + r2 + r4 + r3 .... and so on
recipe2 = r1 + r2 + r4 + r4 + r3 ....


In the end I've used one huge whacking great regex for each "recipe" -
my main question was about can we combine capture groups and term
binding?  It seems the precedence in the RE engine is to do the
captures first then unwind the binding. Or something.

Cheers

SM

On 9/28/07, Peter Szinek <peter / rubyrailways.com> wrote:
> Hey Simon,
>
> >
> > string = <<-EOVAR
> > XD 1 * 100000436 3441863 1550663 1161254 951982
> > XD 1 479903531056 47988002622 21360568539 18276299303 15476234490
> > XD 1 66934 5552 321640438 40297830 0
> > XD 1 0 3235 2197 10907 1631621
> > XD 1 15488078 210564267 574075997 2405132745 7805716381
> > XD 1 0 4949 0 58361 0
> > (goes for about 17 lines, all separated by \n)
> > <<EOVAR
>
> Maybe I am seriously misunderstanding something, but why not just:
>
> string.split("\n").map{|line| line.scan(/\d+/)} ?
>
> Cheers,
> Peter
> __
> http://www.rubyrailways.com
> http://scrubyt.org
>
>
>


-- 
Simon Mullis
_________________
simon / mullis.co.uk