On Thu, Oct 15, 2009 at 8:24 AM, George George
<george.githinji / gmail.com> wrote:
> i have some script in which i would like to match a string against
> 'many' regular expressions patterns.
>
> def group(string)
>   Ω򽾱> group =1
> else
> ...
>  
> end
>
> My worry is the amount of patterns that i have (exceeding 400) and the
> efficiency and sanity of such an approach.What would you advice?

Your mega-pattern will be quite slow if many strings doesn't match any
pattern, and even slower if many strings matches some patterns
partially, since the regexp engine would end up backtracking a lot.

Are you sure you can't construct a more general pattern and test for
values of the match data after? I find it hard to imagine 400 useful
patterns without any similar structure. For example,

  GROUPS = {
    "somegroup" => 1,
    "othergroup" => 2
  }
  if string =~ /^(\w+)\s+(\d+)$/
    group = GROUPS[$1.downcase]
  end

Another strategy is divide and conquer. See if you can group your
patterns into groups that are similar and construct a more general
regexp which use can use as a initial filter to determine which of the
actual patterns you need to test against. E.g.

if string =~ /superpattern1/
  if string =~ /pattern1|pattern2|pattern3/
    group = 1
  end
end
if string =~ /superpattern2/
  if string =~ /pattern4|pattern5|pattern6/
  ...