On Thu, Oct 15, 2009 at 8:24 AM, George George
<george.githinji / gmail.com> wrote:
> i have some script in which i would like to match a string against
> 'many' regular expressions patterns.
>
> def group(string)
> =A0if string=3D~ /pattern1 |patter2|pattern3|pattern(N)/ =A0#where N =3D>=
100
> =A0 group =3D1
> else
> =A0....
> =A0end
> end
>
> My worry is the amount of patterns that i have (exceeding 400) and the
> efficiency and sanity of such an approach.What would you advice?

Your mega-pattern will be quite slow if many strings doesn't match any
pattern, and even slower if many strings matches some patterns
partially, since the regexp engine would end up backtracking a lot.

Are you sure you can't construct a more general pattern and test for
values of the match data after? I find it hard to imagine 400 useful
patterns without any similar structure. For example,

  GROUPS =3D {
    "somegroup" =3D> 1,
    "othergroup" =3D> 2
  }
  if string =3D~ /^(\w+)\s+(\d+)$/
    group =3D GROUPS[$1.downcase]
  end

Another strategy is divide and conquer. See if you can group your
patterns into groups that are similar and construct a more general
regexp which use can use as a initial filter to determine which of the
actual patterns you need to test against. E.g.

if string =3D~ /superpattern1/
  if string =3D~ /pattern1|pattern2|pattern3/
    group =3D 1
  end
end
if string =3D~ /superpattern2/
  if string =3D~ /pattern4|pattern5|pattern6/
  ...