On Dec 1, 2008, at 5:41 PM, Joe W=F6lfel wrote:
> On 1 d=E9c. 08, at 17:08, Rob Biedenharn wrote:
>> On Dec 1, 2008, at 4:32 PM, Joe W=F6lfel wrote:
>>> On 1 d=E9c. 08, at 14:52, Kyle Schmitt wrote:
>>>> On Mon, Dec 1, 2008 at 1:51 PM, Kyle Schmitt =
<kyleaschmitt / gmail.com=20
>>>> > wrote:
>>>>> I just wanted to mention another way of combining regexes that may
>>>>> help you stay sane: union.
>>>>>
>>>>> #You write each regex nice and simple like..
>>>>> startswith=3D/~23430000/
>>>>> codered=3D/CodeRed/
>>>>>
>>>>> #Then combine them to a complex one
>>>>> combined_regex=3DRegexp.union(startswith,codered)
>>>>>
>>>>> When you've got to build up some large regular expressions, this =20=

>>>>> can
>>>>> be a godsend, especially when revisiting code you haven't looked =20=

>>>>> at in
>>>>> awhile.
>>>>>
>>>>> --Kyle
>>>>
>>>> Scratch that, not thinking clearly!  This is to match startswith OR
>>>> codered, not necessarily both.
>>>>
>>>> Still, I maintain that this is a way of staying sane with complex =20=

>>>> regexes :)
>>>>
>>>
>>> Interesting that there is a union function but no intersection =20
>>> function.
>>
>>
>> How would you even define a regexp (re) that matched only when both =20=

>> of two other regexps (re1, re2) matched?
>>
>>    class Regexp
>>      def self.intersection(re1,re2)
>>        union(compile(/(?>#{re1}).*#{re2}/),
>>              compile(/(?>#{re2}).*#{re1}/))
>>      end
>>    end
>>
>>    re =3D Regexp.intersection(re1,re2)
>>
>> What would you expect the value to be?  And while Regexp.union is =20
>> well-behaved for multiple arguments, the expansion for more =20
>> arguments in the intersection gets ugly fast.
>>
>> -Rob
>>
>> Rob Biedenharn		http://agileconsultingllc.com
>> Rob / AgileConsultingLLC.com
>
> Not sure I understand.  Are you arguing that an intersection cannot =20=

> exist as a regular expression or merely that it is hard?

That it becomes combinatorially hard to construct such a regexp in =20
general. If I want a regexp that matches the intersection of /a/ and /=20=

b/ and /c/ (i.e., contains each of 'a', 'b', and 'c'), I have to =20
account for all the permutations (manually):
/a.*b.*c/
/a.*c.*b/
/b.*a.*c/
/b.*c.*a/
/c.*a.*b/
/c.*b.*a/

Or combined as:  /(?:a.*(?:b.*c)|(?:c.*b))|(?:b.*(?:a.*c)|(?:c.*a))|=20
(?:c.*(?:b.*a)|(?:a.*b))/

That's nasty and so much worse than the union /[abc]/ or /a|b|c/ even =20=

for this relatively simple case.  It would be better to do this at the =20=

application level if you can't guarantee order:

[/a/, /b/, /c/].all? {|re| mystring =3D~ re }

And then the value of the match can be whatever the application wants =20=

to track.

-Rob=