Hello --

On Fri, 11 Oct 2002, Dale Martenson wrote:

> The latest edition of "Mastering Regular Expressions, 2nd Edition" refers to
> Ruby (Yippy!!), but not always in a positive light (Bummer!!).

Take heart: some of the things you've listed are not negative, but
merely descriptive.

> page 128, Table 3-11: Line Anchors for Some Scripting Languages. This table
> lists "Concerns" and how they are handled. Under the Ruby column, the
> following "Concerns" are noted:

I don't think he's using "concern" in a negative way here.  It's just
a chart of "concerns", in the sense of "things that come into play
where line anchors are involved", and how a variety of languages
handle them.  If they were all negative points, he'd be condemning the
existence of line anchor handling in all of these languages :-)

> Concern: "^ matches after any newline".
> -- Note: "Ruby's $ and ^ match at embedded newlines, but its \A and \Z do
> not"
>
> Concern: "$ matches before any newline"
> -- Note: "Ruby's $ and ^ match at embedded newlines, but its \A and \Z do
> not"
>
> Under the title "Enhanced line-anchor mode. . ."
> -- Note: "N/A". Indicates Ruby does not have this feature. While every other
> language listed does (Java, Perl, PHP, Python, Tcl, .NET).

That's because Ruby doesn't need it :-)

In Ruby, $ and ^ always match starts and ends of lines (embedded or
otherwise), while \A and the \Z,\z pair matching the beginning and end
of strings (\Z and \z differing as to whether they match before or
after a final newline, if any).  Therefore, you don't need a special
"mode" indicating that $ and ^ should temporarily change their
meanings.  It's very simple and very consistent.

> Concern: "\A always matches like normal ^"
> -- Note: "Ruby's \A, unlike its ^, matches only at the start of the string"
>
> Concern: "\Z always matches like normal $"
> -- Note: "Ruby's \Z, unlike its $, matches at the end of the string, or
> before a string-ending newline"
>
> Concern: "\z always matches only at end of string"
> -- Note: "N/A".

Yes, all related to the same point: in Ruby, \A,\Z/z, ^, and $ all do
their jobs without overlapping or needing behavior-altering switches.

> page 131, "My testing has shown that java.util.regex and Ruby have \G match
> at the start of the current match, while Perl and the .NET languages have it
> match at the end of the previous match. (Sun tells me that the next release
> of java.util.regex will have its \g behavior match the documentation.)"

Hmmm.  I'm not sure about that one, or why there's that difference.

> page 132, Table 3-12: A Few Utilities and Their Word Boundary
> Metacharacters. The table indicates that Ruby does not support
> "Start-of-word" and "End-of-word" boundary characters [e.g. Perl: (?<!\w)
> (?=\w) ... (?<=\w) (?!\w) ].

I'm puzzling through why one would need those as long as one has \b
and \B.  It looks like Friedl is saying: here are different ways to
achieve this, either with \b/\B and/or with lookahead/lookbehind.  I'm
not sure whether there are plans afoot to add lookbehind to Ruby
regexes, but in any case, I'd use \b and \B for word boundaries.

I notice this from the Ruby ChangeLog:

  Thu Nov  4 17:41:18 1999  Yukihiro Matsumoto  <matz / netlab.co.jp>

	  * regex.c (re_compile_pattern): \< (wordbeg), \> (wordend)
            disabled.

So I guess it existed at some point and Matz decided we didn't need
it.  (You can see its remains in regex.c :-)

> page 133, "Ruby has a bug whereby sometimes (?i) doesn't apply to
> |-separated alternatives that are lowercase (but does if they're
> uppercase)."

I wish he'd provide an example....

> I am not a Master at Regular Expressions so I would like comments on if
> these things should change (or possibly already are changed) in Ruby.

Any that are bugs should change :-)  The line and string boundary
syntax in Ruby seems to me to be exemplary; I wouldn't want to see
that regress.


David

-- 
David Alan Black                      | Register for RubyConf 2002!
home: dblack / candle.superlink.net     | November 1-3
work: blackdav / shu.edu                | Seattle, WA, USA
Web:  http://pirate.shu.edu/~blackdav | http://www.rubyconf.com