Jeffrey Schwab wrote: > Peter Schrammel wrote: > >> got problem with big regexes: >> I have a regex of about 70000+ words concated with '|' that I'd like to >> match as a regex. /bla|blub|foo|bar|.....(70000)/ >> >> But unfortunately ruby gives me a 'regular expression too big' if I'm >> trying to build such a thing. >> I had a look at the regex.c code and saw the limit of 1 << 16 bytes for >> regexes. Is there a way around this (without going down to 2000 words) ? >> >> Thanks for any hint > > You could optimize the regex a little for size, e.g. by factoring out > common prefixes: > > (b(l(a|ub)|ar)|foo)... Thought of that. > Of course, that will only help if the | alternatives have a reasonable > amount of redundancy. Alternatively, you could just break the whole > thing into multiple expressions. Instead of > > if /first_part|second_part/ =~ text > > You could try: > > if /first_part/ =~ text or /second_part/ =~ text Yes, that was my next thought but where to split? Just count the bytes and splitt near 1 <<16? Why is there a limitation at all? I implemented the same thing in perl and it no complains ... Is the regexp engine of perl that much better? Thanks for the reply