Regexps in Ruby can feel like a jagged edge to the otherwise smooth
curves. For all the syntactical smarts we employ so as to avoid
Perl-like punctuation, regular expressions take us two steps back,
their readability deteriorating with their length. They pry us from
Ruby's embrace, requiring proficiency in a new language, bereft of
familiar idioms. Regexps are neither introspective, w.r.t the pattern
itself, nor extensible with inheritance, mixins, or new method
definitions. Put another way, they don't act like we expect objects to
act.

For some tasks, regexps are certainly the superior tool. When you
require lookahead, nested constructs, and other advanced features,
their worth is evident. However, most regular expressions are nowhere
near as complicated. They may be used to determine whether a String
contains a number, or to extract a YYYY/MM/DD-format date. So, I'm
wondering whether we can make the common case easier by creating a
Regexp-lite that follows Ruby semantics and style without the
unnecessary complexity/punctuation? Traditional regexps would, of
course, still be available for complex tasks.

Imagine a String as an Array of Char objects. The Char class would
define predicate methods corresponding to Unicode and regexp
properties. For instance Char#lowercase? or Char#digit?. The
implementation is trivial because Regexps are used internally.

Matching the String against a pattern is now a case of calling the
appropriate predicate methods on the underlying characters. For
example, match(/\w+/) could be expressed as match(:word). This would
succeed if at least one character in the string returns true for
Char#word?. Similarly, match(:hex, :blank, :digit) would:

* Call #hex? on each character until it succeeds then set a flag to
signify that the match has began.
* Once #hex? has returned true once, try calling #blank? for every
subsequent character.
  * If it matches, #blank? becomes the current predicate;
  * Otherwise #hex? remains the current predicate.
* Once #blank? is the current predicate, try calling #digit? for every
subsequent character
  * If it matches, then the match has succeeded so we can stop.
  * If it doesn't match, continue calling #blank? on each character
until its neighbor matches #digit?

IOW, it matches non-greedily, assuming each term should match as many
times as it can while still allowing the overall match to succeed. My
current mockup doesn't try backtracking, but it certainly could. I'm
using String#ematch to add implicit start/end anchors.

So even at this point we have a more Rubyish interface, with no line
noise, which can be extended by modifying the Char class.

Each term has an implicit capturing parentheses around it, and #match
will return MatchData[1] so numbered captures will work.

Wildcards can be supported with a Char#any? predicate that always returns true.

Even if you stop here, you have a reasonably capable Regexp subset. If
additional functionality is desired, we can support non-Symbol terms.
A Fixnum term could represent a back-reference. A Range argument could
represent a character class, e.g. match(:digit, 5..9) =>
match(/\d+[5-9]+/). A String could represent a literal String, e.g.
match('glark') => /glark/. An Array could be used for alternation such
that one of its elements are required to match. A Hash could be used
to support named captures such that {:name => :digit} matches a digit
and captures them to the :name group. And so on. The overarching
benefit is that patterns feel like Ruby.

I've so far talked in terms of matching against Strings, but the
generality of this approach suggests that it could be associated with
any enumerator. (I've overloaded String#chars to return Char objects
for String). This could allow functional-style matching against data
structures.

There are clearly a lot of unanswered questions to consider. Is there
any interest in a core implementation along these lines, or is the
status quo seen as ideal? :-)