> Jeffrey Schwab wrote:
> > > You could optimize the regex a little for size, e.g. by factoring out
> > > common prefixes:
> > >
> > >     (b(l(a|ub)|ar)|foo)...
>
> Peter Schrammel wrote:
> > Thought of that.
>
> Have you seen:
> "Converts a list of words to a regular expression with minimum
> backtracking by joining words with common prefixes. It is a port
> of the Perl module MakeRegex.pm by Hakan Kjellerstrand with
> some improvements."
>     http://raa.ruby-lang.org/project/makeregex/
>
> YMMV; I have never used it on anything like the scale you are.

In a little bit of testing here, it goes to long after about 8,000
words.

> require 'makeregex'
>
> 20.times do |n|
>   words = IO.readlines("/usr/share/dict/words")[0..(2 ** n)]
>
>   start = Time.now
>
>   r = Regexp.make(words)
>
>   finish = Time.now
>
>   puts "Took #{finish - start} seconds to convert #{words.size} words into a regex #{r.size} bytes long."
>
>   "FOO".match(r)
> end

Took 0.000372 seconds to convert 2 words into a regex 20 bytes long.
Took 0.000285 seconds to convert 3 words into a regex 25 bytes long.
Took 0.000359 seconds to convert 5 words into a regex 51 bytes long.
Took 0.000493 seconds to convert 9 words into a regex 86 bytes long.
Took 0.000973 seconds to convert 17 words into a regex 157 bytes long.
Took 0.001773 seconds to convert 33 words into a regex 285 bytes long.
Took 0.005386 seconds to convert 65 words into a regex 491 bytes long.
Took 0.00823 seconds to convert 129 words into a regex 933 bytes long.
Took 0.019234 seconds to convert 257 words into a regex 1876 bytes long.
Took 0.042557 seconds to convert 513 words into a regex 3856 bytes long.
Took 0.09146 seconds to convert 1025 words into a regex 7807 bytes long.
Took 0.196851 seconds to convert 2049 words into a regex 15669 bytes long.
Took 0.399155 seconds to convert 4097 words into a regex 32325 bytes long.
Took 0.968776 seconds to convert 8193 words into a regex 64671 bytes long.
foo:14:in `match': regular expression too big:
/(?:1(?:0(?:80\n|\-point\n|th\n)|...