On Wed, 30 Apr 2003, Austin Ziegler wrote:

> I'm doing something that required RLE, and the code that I
> translated from Perl to do this included the following regexp:
>
>   /^(.*?)((.)\2{2,127})(.*?)$/ois
<snip>

austin - are you sure this works?  it's odd because

>   /^(.*?)((.)\2{2,127})(.*?)$/ois
           ^^^^^^^^^^^^^^
to me this seems to say, "the second match shall be composed of a single char
followed 2 to 127 of the second match."  in otherwords, it would seem to be
recursive.

for instance, this does not work in perl or ruby:

  ~ > cat foo
  'foo' =~ /^(.)((.)\2)/;
  print $1,"\n";
  print $2,"\n";
  print $3,"\n";
  > perl foo



  > ruby foo
  nil
  nil
  nil

but this does

  ~ > cat foo
  'foo' =~ /^(.)((.)\3)/;
  print $1,"\n";
  print $2,"\n";
  print $3,"\n";
  > perl foo
  f
  oo
  o
  > ruby foo
  f
  oo
  o

i guess what i am saying is that i don't see how the original match ever had
valid semantics and that, if it worked, it would seem to imply a broken perl.

in any case - ruby's behaviour seems correct.

> While what's happening makes sense, I'm wondering if it's correct -- how
> deep should backreferences be nested and considered part of the process?

i think the only meaningful way is for *each* '(' which is not escaped, or
followed by a '?:' to begin the '\n' and '$n' groups - without limiting the
depth.

-a

--
  ====================================
  | Ara Howard
  | NOAA Forecast Systems Laboratory
  | Information and Technology Services
  | Data Systems Group
  | R/FST 325 Broadway
  | Boulder, CO 80305-3328
  | Email: ara.t.howard / fsl.noaa.gov
  | Phone:  303-497-7238
  | Fax:    303-497-7259
  ====================================