Harry Ohlsen, April 29:

> A colleague of mine just asked me whether it was possible to invert an
> arbitrary regular expression.  Ie, create a new regular expression that
> matches whatever the original *didn't*.

Well, we have to remember that regular languages are closed under
set-difference, so it can be done.  However, this doesn't mean that we
have any way of expressing such a regular language with a regular
expression.  The grammar of Ruby's regular expressions doesn't provide a
set-difference operator, so there's really no way of doing it in Ruby.

It can be done, as you show, for a single literal, but that's not the
same thing.  That's taking the set-difference of the input alphabet and
a given set of literals, i.e., [^a-z] is equivalent to ¦² - {a, b, ¡Ä, z}.
One should also remember that [¡Ä] is only syntactic sugar and is
equivalent to the the union of all members (with provisions for removing
literals with ^ and expanding aãàÅà ranges).

What you want is a complementation operator, i.e., one that gives you
the regular language Σ* - L, where L is the regular language denoted by
your regular expression and - is the set-difference operator.

The issue of matching an inverted regular expression also depends on
what kind of finite state automaton is used.  It's simple to do for
deterministic finite automatons; not so simple for nondeterministic ones
(actually, I'm not even sure if it's possible at all).

So, in summary, Ruby has no way of letting you express complementation.
However, it may not be what you actually want in practice.  Perhaps what
you really want is to match inbetween matches of a regular expression.
This is simple enough to do.  All it takes is to keep track of a couple
of input positions,
        nikolai

-- 
Nikolai Weibull: now available free of charge at http://bitwi.se/!
Born in Chicago, IL USA; currently residing in Gothenburg, Sweden.
main(){printf(&linux["\021%six\012\0"],(linux)["have"]+"fun"-97);}