On Dec 1, 2008, at 5:41 PM, Joe Wfel wrote:
> On 1 d. 08, at 17:08, Rob Biedenharn wrote:
>> On Dec 1, 2008, at 4:32 PM, Joe Wfel wrote:
>>> On 1 d. 08, at 14:52, Kyle Schmitt wrote:
>>>> On Mon, Dec 1, 2008 at 1:51 PM, Kyle Schmitt <kyleaschmitt / gmail.com 
>>>> > wrote:
>>>>> I just wanted to mention another way of combining regexes that may
>>>>> help you stay sane: union.
>>>>>
>>>>> #You write each regex nice and simple like..
>>>>> startswith=/~23430000/
>>>>> codered=/CodeRed/
>>>>>
>>>>> #Then combine them to a complex one
>>>>> combined_regex=Regexp.union(startswith,codered)
>>>>>
>>>>> When you've got to build up some large regular expressions, this  >>>> can
>>>>> be a godsend, especially when revisiting code you haven't looked  >>>> at in
>>>>> awhile.
>>>>>
>>>>> --Kyle
>>>>
>>>> Scratch that, not thinking clearly!  This is to match startswith OR
>>>> codered, not necessarily both.
>>>>
>>>> Still, I maintain that this is a way of staying sane with complex  >>> regexes :)
>>>>
>>>
>>> Interesting that there is a union function but no intersection  
>>> function.
>>
>>
>> How would you even define a regexp (re) that matched only when both  > of two other regexps (re1, re2) matched?
>>
>>    class Regexp
>>      def self.intersection(re1,re2)
>>        union(compile(/(?>#{re1}).*#{re2}/),
>>              compile(/(?>#{re2}).*#{re1}/))
>>      end
>>    end
>>
>>    re = Regexp.intersection(re1,re2)
>>
>> What would you expect the value to be?  And while Regexp.union is  
>> well-behaved for multiple arguments, the expansion for more  
>> arguments in the intersection gets ugly fast.
>>
>> -Rob
>>
>> Rob Biedenharn		http://agileconsultingllc.com
>> Rob / AgileConsultingLLC.com
>
> Not sure I understand.  Are you arguing that an intersection cannot  xist as a regular expression or merely that it is hard?

That it becomes combinatorially hard to construct such a regexp in  
general. If I want a regexp that matches the intersection of /a/ and / / and /c/ (i.e., contains each of 'a', 'b', and 'c'), I have to  
account for all the permutations (manually):
/a.*b.*c/
/a.*c.*b/
/b.*a.*c/
/b.*c.*a/
/c.*a.*b/
/c.*b.*a/

Or combined as:  /(?:a.*(?:b.*c)|(?:c.*b))|(?:b.*(?:a.*c)|(?:c.*a))| 
(?:c.*(?:b.*a)|(?:a.*b))/

That's nasty and so much worse than the union /[abc]/ or /a|b|c/ even  or this relatively simple case.  It would be better to do this at the  pplication level if you can't guarantee order:

[/a/, /b/, /c/].all? {|re| mystring =~ re }

And then the value of the match can be whatever the application wants  o track.