"Florian Gross" <flgr / ccan.de> schrieb im Newsbeitrag
news:2o235cF4pgrgU1 / uni-berlin.de...
> Robert Klemme wrote:
>
> > I once made a script that takes a list of strings and creates a
> > single regexp from them that is supposedly matched fast (i.e. doesn't
need
> > backtracking).  If you're interested I can check at work tomorrow -
I'm
> > quite sure I still have it somewhere (please email as reminder).
>
> Is this a trie optimization?

Yep.

> I've written something similar for
> Regexp::English but the optimization itself is quite slow and takes up
> lots of resources. (Because it needs to build the trie.)

Well, it was ok in my case because the word list didn't change and I put
the generated regexp into code.  Although I'm not sure that it was really
that slow.  Lemmesee...  IMHO Theoretically it should be around O(n*m)
with n the number of words and m the average word length.

> It can do this:
>
> irb(main):006:0> re
> => /foobar|foobark|foobard|foobarqux|food|fool|foolish/
> irb(main):007:0> re.optimize
> => /foo(?:l(?:ish)?|bar(?:qux|[kd])?|d)/

Exactly.  Only that my solution took a set of words as input.

> It was able to create an optimized Regexp that matches the first few
> hundred English words from a words file -- I've attached it. :)

Beautiful!

Kind regards

    robert