On Mon, 01 Dec 2008 17:08:16 -0500, Rob Biedenharn wrote:

> On Dec 1, 2008, at 4:32 PM, Joe Wölfel wrote:
>> On 1 déc. 08, at 14:52, Kyle Schmitt wrote:
>>> On Mon, Dec 1, 2008 at 1:51 PM, Kyle Schmitt <kyleaschmitt / gmail.com>
>>> wrote:
>>>> I just wanted to mention another way of combining regexes that may
>>>> help you stay sane: union.
>>>>
>>>> #You write each regex nice and simple like.. startswith=/~23430000/
>>>> codered=/CodeRed/
>>>>
>>>> #Then combine them to a complex one
>>>> combined_regex=Regexp.union(startswith,codered)
>>>>
>>>> When you've got to build up some large regular expressions, this can
>>>> be a godsend, especially when revisiting code you haven't looked at
>>>> in
>>>> awhile.
>>>>
>>>> --Kyle
>>>
>>> Scratch that, not thinking clearly!  This is to match startswith OR
>>> codered, not necessarily both.
>>>
>>> Still, I maintain that this is a way of staying sane with complex
>>> regexes :)
>>>
>>>
>> Interesting that there is a union function but no intersection
>> function.
> 
> 
> How would you even define a regexp (re) that matched only when both of
> two other regexps (re1, re2) matched?
> 
>      class Regexp
>        def self.intersection(re1,re2)
>          union(compile(/(?>#{re1}).*#{re2}/),
>                compile(/(?>#{re2}).*#{re1}/))
>        end
>      end
> 
>      re = Regexp.intersection(re1,re2)
> 
> What would you expect the value to be?  And while Regexp.union is well-
> behaved for multiple arguments, the expansion for more arguments in the
> intersection gets ugly fast.

It's very well defined if you're talking about an underlying 
deterministic finite automaton. That's an simple proof that is usually 
assigned as an exercise for the student in a computer science theory 
course.

If re1 compiles to a DFA dfa1=(S1,s1,A1,f1)
where S1 is the set of all states, s1 is the start state, A1 is the set 
of accepting states, and f1(s'1,input) is the transition function
and re2 compiles to a DFA dfa2=(S2,s2,A2,f2)

Then the intersection of these two languages can be recognized by the DFA
dfa3=(S1 x S2, (s1,s2), A1 x A2, f3)
where x means the cartesian product, and
f3((s'1,s'2),input)=(f1(s'1,input),f2(s'2,input))

Now, how you'd turn that back into a regexp is not so easy...
(but still doable)

--Ken

-- 
Chanoch (Ken) Bloom. PhD candidate. Linguistic Cognition Laboratory.
Department of Computer Science. Illinois Institute of Technology.
http://www.iit.edu/~kbloom1/