Hugh Sasse Staff Elec Eng wrote:
> Well, one way to do it would be to 'take a leaf from' the x modifier,
> which allowed spaces in regexps to be ignored.  /.../r isn't used
> yet, is it?  These could be Ruby extended regexps....

I was about to suggest r as an option to make the engine work in reverse 
direction (as in right-to-left rather than left-to-right if you look at it 
from a Western point of view)... But yeah, I like the sound of an extra 
modifier.

>    foo(*followed-by:bar)

Nice. I'm not sure that a colon is a good idea though -- seems to be used 
as a quote char fairly often. How about /foo(*followed-by*bar)/r or, if the 
/x modifier is used, /foo (*followed-by* bar)/rx, which is maybe more 
readable?

I think this would be a huge benefit for readability -- I was explaining 
some code which made extremely heavy use of lookahead, non-capturing parens 
and so on, and readability was really enhanced when I did a quick search 
and replace with (?=) to (?followed-by?) and so on.

>> (There would be other possibilities too, such as playing games with
>> standalone {...} modifiers (as: {something-non-numeric-here}) or
>> introducing some new escaped metacharacter with the special meaning
>> of introducing these named extensions . . . but since most of these
>> extensions "want to be in groups" anyway, using a group as the basic
>> method of introducing an exception into the syntax seems the most
>> practical to me.)
> 
> I think I agree. The escaped letters don't have that facility, and
> \<...\> means something in other RE engines (vi and gawk use them
> for word boundaries, I think).

IIRC, POSIX defines a load of character classes like [[:alpha:]] and so 
on... Might be an idea to copy that syntax in some places?

Whilst we're on the topic, might I also suggest named capturing groups? 
Using backreferences is a pest if an expression has to be changed / merged, 
since all the numbers go wrong. For example,

  /(*capture:my_word*[a-zA-Z]+) +(*backref:my_word*)/r

could match doubled up words (yeah, I know, there's a colon...). Again, 
it's a readability thing mainly. Okay, it would be a performance hit when 
used, but if it's only enabled when necessary it shouldn't be that bad...

Just a thought...

--
Ciaran McCreesh
mail:     keesh@users-dot-sf-dot-net
www:      http://www.opensourcepan.co.uk/