> Jeffrey Schwab wrote: > > > You could optimize the regex a little for size, e.g. by factoring out > > > common prefixes: > > > > > > (b(l(a|ub)|ar)|foo)... > > Peter Schrammel wrote: > > Thought of that. > > Have you seen: > "Converts a list of words to a regular expression with minimum > backtracking by joining words with common prefixes. It is a port > of the Perl module MakeRegex.pm by Hakan Kjellerstrand with > some improvements." > http://raa.ruby-lang.org/project/makeregex/ > > YMMV; I have never used it on anything like the scale you are. In a little bit of testing here, it goes to long after about 8,000 words. > require 'makeregex' > > 20.times do |n| > words = IO.readlines("/usr/share/dict/words")[0..(2 ** n)] > > start = Time.now > > r = Regexp.make(words) > > finish = Time.now > > puts "Took #{finish - start} seconds to convert #{words.size} words into a regex #{r.size} bytes long." > > "FOO".match(r) > end Took 0.000372 seconds to convert 2 words into a regex 20 bytes long. Took 0.000285 seconds to convert 3 words into a regex 25 bytes long. Took 0.000359 seconds to convert 5 words into a regex 51 bytes long. Took 0.000493 seconds to convert 9 words into a regex 86 bytes long. Took 0.000973 seconds to convert 17 words into a regex 157 bytes long. Took 0.001773 seconds to convert 33 words into a regex 285 bytes long. Took 0.005386 seconds to convert 65 words into a regex 491 bytes long. Took 0.00823 seconds to convert 129 words into a regex 933 bytes long. Took 0.019234 seconds to convert 257 words into a regex 1876 bytes long. Took 0.042557 seconds to convert 513 words into a regex 3856 bytes long. Took 0.09146 seconds to convert 1025 words into a regex 7807 bytes long. Took 0.196851 seconds to convert 2049 words into a regex 15669 bytes long. Took 0.399155 seconds to convert 4097 words into a regex 32325 bytes long. Took 0.968776 seconds to convert 8193 words into a regex 64671 bytes long. foo:14:in `match': regular expression too big: /(?:1(?:0(?:80\n|\-point\n|th\n)|...