Adrian Petru Dimulescu wrote:
> Hello,
>
> I have a regex infinte loop kind of problem. I use ruby 1.8.2.  The
> regular expression I used was:
>
> [tT]he\s+(([\w\d\_]+(?:[a-zA-Z][\dA-Z]|[\dA-Z][a-zA-Z])[\w\d\_]*((\s*(\,|and|or)\s*)*[\w\d\_]+(?:[a-zA-Z][\dA-Z]|[\dA-Z][a-zA-Z])[\w\d\_]*)*))\s+((\w+\s+){0,3}\s*(proteins|genes|protein|gene))
>
> I try to match this regex against the string, without the quotes (the
> following should be a whole single line):
>
> "to this end , NK_CTL clones derived from four donors ( KK , GG , GF ,
> and DP ) were tested for their ability to lyse the TAP2_deficient
> RMA_S\HLA_E cell_line incubated with serial_dilutions of the VMAPRTLIL ,
> VMAPRTLVL , VMAPRTLLL , and VMAPRALLL peptides ."
>



regexp =
/ # "The " or "the "
  [Tt]he\s+
  # Protein consists of letters & digits.  Must not be only digits.
  (?!\d+\s)
  [A-Za-z\d]+\s+
  (
    # ", "
    ,\s+
    # Optional "and "
    (and\s+)?
    # Protein consists of letters & digits.  Must not be only digits.
    (?!\d+\s)
    [A-Za-z\d]+\s+
  )*
  # 0--3 adjectives
  ([A-Za-z]+\s+){0,3}
  (protein|gene)s?
/x