On Thu, May 22, 2003 at 08:09:04PM +0900, dblack / superlink.net wrote:
> > There was a discussion a few weeks back about Ruby's handling of ^
> > and $ in regexps, and I have realised what may me so uncomfortable
> > with it. I'm used to matching strings on /^...$/ to mean "match
> > exactly this", and it doesn't work. In fact it could lead to very
> > nasty security holes. Consider this example:
> 
> But... but... it's not like it's being kept a secret :-)

Well no, if you read the documentation in its entirety, and forget
everything you knew about regexps and Perl previously. But regexp handling
in Ruby cries out "Yes I'm like Perl! I have /regexp/ and =~ and $1,$2..."
and you have to read the small print - or in my case write broken programs -
to discover something as fundamental as start and end anchoring doesn't work
in the way that you expect.

"Way that I expect" comes from not only Perl, but also things like Exim
(which embeds PCRE, Perl-compatible Regular Expressions)

> >       str.untaint if str =~ /\A[a-z0-9]+\z/
> >
> > The asymmetry between \A and \z is annoying (I have to keep looking
> > it up to remember which one is capital and which is lower-case), and
> > it leaves regular expressions looking a lot less readable.
> 
> You can probably use \Z in most cases; the only difference between \z
> and \Z is that \Z anchors before a trailing newline, if there is one.

I want to say unambiguously "start of string" and "end of string", with no
messing around. If I am validating a string which is going to be inserted
into another string later on, it's important to me whether the provided
value has or does not have a trailing newline.

Cheers,

Brian.