I'm attempting to write a module whose classes will make it easier to  
construct and work with regular expressions. The two, somewhat self  
explanatory files (main and test) are given below. I would be grateful  
for feedback, especially on better ways of doing things. Also note  
that I am developing on JRuby, so currently JRuby re sytax is more  
supported than Ruby syntax.

Check out the second (test) file for simple examples of use.

Thanks,
Ken

File 'rex.rb'
-------------------
=begin rdoc
'rex.rb' is a file that provide classes intended to make it easier to  
develop
and use regular expressions. A primary feature is that it allows one  
to easily
construct larger regular expressions out of smaller regular  
expressions. The
other main feature is that it provides (or will provide) many  
functions that
make it easier to apply regular expressions in useful ways. I also  
believe that,
thought it is more verbose than standard Regexps, it provides much  
more readable
code when constructing complex regular expressions.

rex is not intended to be comprehensive; I don't have time for that.  
My hope is
that it will be useful for the 95% of 'common case' re's.
=end

CHARACTERS = {
   :dot => "\\t",
   :tab => "\\t",
   :vtab => "\\v",
   :newline => "\\n",
   :return => "\\r",
   :backspace => "\\b",
   :form_feed => "\\f",
   :bell => "\\a",
   :esc => "\\e",
   :word_char => "\\w",
   :non_word_char => "\\W",
   :whitespace_char => "\\s",
   :non_whitespace_char => "\\S",
   :digit_char => "\\d",
   :non_digit_char => "\\D"
}

class Rex

   attr_writer :is_group

=begin rdoc
Create a new Rex pattern with _string_ as the pattern that will be  
passed to
Regexp. This is used by other Rex functions; you can also use it to  
create
a 'raw' pattern.
=end
   def initialize(string)
     @pat = string
     @is_group = false
     @regexp = Regexp.new(@pat)
   end

   def index(string, start=0)
     return string.index(@regexp, start)
   end

=begin rdoc
yields each match in the string in succession
=end
   def each(string)
     start = 0
     while true:
       i = string.index(@regexp, start)
       print "MATCHED #{@regexp.inspect} AT #{i}!\n"
       if i == nil; break; end
       md = $~
       yield md
       if md.end(0) == start
         start = start + 1
       else
         start = md.end(0)
       end
     end
   end
=begin rdoc
Same as =~ on the corresponding Regexp
=end
   def =~(string)
     return @regexp =~ string
   end

=begin rdoc
Returns the pattern associated with this Rex instance. This is the  
string is
passed to Regexp to create a new Regexp.
=end
   def pat
     return @pat
   end

   def group
     if @is_group
       return self
     else
       result = Rex.new("(?:#{@pat})")
       result.is_group = true
       return result
     end
   end

=begin rdoc
Regular expression concatenation; Lit.new("ab") + Lit.new("cd") will  
produce
a Rex that has the same meaning as the Regexp /abcd/ (though the  
pattern will
be different.
=end
   def +(other)
     return Rex.new(self.group.pat + other.group.pat)
   end

=begin rdoc
Used to define a named group. If _rex_ is a Rex instance with an  
internal pattern
_pat_, then _rex_['name'] produces a new Rex with pattern (? 
<name>_pat_).
=end
   def [](name)
     result = Rex.new("(?<#{name}>#{@pat})")
     result.is_group = true
     return result
   end

   #    def +(other)
   #        r1 = self
   #        r1 = r1.wrap_if_not("+")
   #        other = other.wrap_if_not("+")
   #        r = Regexp.new(r1 + other)
   #        r.operator = "+"
   #        return r
   #    end


=begin rdoc
Regular expression alternation. Lit.new("ab") | Lit.new("cd") will  
produce
a Rex that has the same meaning as the Regexp /ab|cd/ (though the  
pattern will
be different.
=end
   def |(other)
     return Rex.new(self.group.pat + "|" + other.group.pat)
   end

=begin rdoc
Same as the corresponding *match* method in Regexp.
=end
   def match(string)
     return @regexp.match(string)
   end

=begin rdoc
Returns a new Rex that is an optional version of this one;  
Lit.new('a').optional
has the same effect as the Regexp /a?/
=end
   def optional
     return Rex.new(self.group.pat + "?")
   end

   # Invoke up a Rex to indicate it is naturally grouped, i.e. does  
not need to
   # be surrounded with parens before being put into another Rex.
   def natural_group # :nodoc:
     @is_group = true
     return self
   end

=begin rdoc
Defines regular expression repetitions. Lit.new('a').n(3) is the same as
/a{3,}/, while Lit.new(3..7) is the same as /a{3,7}/. use 0 or 1 to  
achieve
the same effect as the * and + Regexp operators. Tri-period ranges of  
the form
3...8 are allowed, and have the same meaning as one would expect, i.e.  
that
range give the same result as 3..7.
=end
   def n(repetitions)
     if repetitions.is_a?(Integer)
       return Rex.new(self.group.pat +  
"{#{repetitions},}").natural_group
     elsif repetitions.is_a?(Range)
       ending = repetitions.end
       if repetitions.exclude_end?
         ending -= 1
       end
       return Rex.new(self.group.pat +  
"{#{repetitions.begin},#{ending}}").natural_group
     end
   end

=begin rdoc
Same as method *n*, but nongreedy.
=end
   def n?(repetitions)
     if repetitions.is_a?(Integer)
       return Rex.new(self.group.pat +  
"{#{repetitions},}?").natural_group
     elsif repetitions.is_a?(Range)
       ending = repetitions.end
       if repetitions.exclude_end?
         ending -= 1
       end
       return Rex.new(self.group.pat +  
"{#{repetitions.begin},#{ending}}?").natural_group
     end
   end

   def to_s
     return @pat
   end
end

=begin rdoc
Create a new literal that will match exactly that string. This handles  
Regexp
escaping for you, so you do not need to worry about handling  
characters with
special meanings in Regexp.
=end
class Lit < Rex
   def initialize(string)
     @pat = Regexp.escape(string)
     @regexp = Regexp.new(@pat)
     @is_group = false
   end
end

class Chars < Rex
=begin rdoc
Creates a character class that matches those characters given in  
_include_,
except for those given in _exclude_. Each of _include_ and _exclude_  
should be
one of:

* A string, in which case it defines the set of characters to be  
included or excluded.
* A double-dot (x..y) range, which will define a range of characters  
to be included or excluded.
* A list of strings and ranges, which have the same meanings as above  
and are combined to produce the set of characters to be included or  
excluded.
* A symbol, which is used to denote one of the special character  
classes.

Note that Chars defines no special characters.
_include_:: The set of characters to be included in the class. Include  
may be nil or the empty string, if you don't want to include  
characters in the class.
_exclude_:: The set of characters to be excluded from the class.  
Defaults to nil.
=end
   def initialize(include, exclude=nil)

     def list_to_chars(list)
       chars = ""
       list.each {|e|
         if e.is_a?(String)
           chars << Regexp.escape(e)
         elsif e.is_a?(Range)
           chars << Regexp.escape(e.begin) << "-" <<  
Regexp.escape(e.end)
         elsif e.is_a?(Symbol)
           chars << "[:" << e.to_s << ":]"
         end
       }
       return chars
     end

     if include == nil or include == ""
       include = nil
     elsif include.is_a?(Array)
       include = list_to_chars(include)
     else
       include = list_to_chars([include])
     end

     if exclude.is_a?(Array)
       exclude = list_to_chars(exclude)
     elsif exclude != nil
       exclude = list_to_chars([exclude])
     end

     if exclude == nil
       chars = ("[#{include}]")
     elsif include == nil
       chars = "[^#{exclude}]"
     else
       chars = ("[#{include}&&[^#{exclude}]]")
     end

     @pat = chars
     @regexp = Regexp.new(@pat)
     @is_group = true
   end
end







File 'rex_test.rb'
-----------------------
$:.unshift File.join(File.dirname(__FILE__),'..','lib')

require 'test/unit'
require 'rex'

class RexTest < Test::Unit::TestCase
   def test_simple
     posint = Rex.new('[0-9]+')
     posfloat = posint + (Lit.new('.') + posint).optional
     float = (Lit.new('+')|Lit.new('-')).optional + posfloat
     complex = float['re'] + (Lit.new('+')|Lit.new('-')) +  
posfloat['im'] + Lit.new('i')
     print complex
     assert_equal(0, posint =~ "123")
     assert_equal(0, posfloat =~ "123.45")
     assert_equal(0, posfloat =~ "123")
     assert_equal("3.45", complex.match(" 3.45-2i")['re'])
   end

   def test_repetitions
     assert_equal("(?:a){3,}", Lit.new('a').n(3).pat)
     assert_equal("(?:a){3,5}", Lit.new('a').n(3..5).pat)
     assert_equal("(?:a){3,4}", Lit.new('a').n(3...5).pat)
     assert_equal("(?:a){3,}?", Lit.new('a').n?(3).pat)
     assert_equal("(?:a){3,5}?", Lit.new('a').n?(3..5).pat)
     assert_equal("(?:a){3,4}?", Lit.new('a').n?(3...5).pat)
   end

   def test_char_class
     assert_equal("[abc]", Chars.new("abc").pat)
     assert_equal("[^abc]", Chars.new(nil, "abc").pat)
     assert_equal("[abc&&[^de]]", Chars.new("abc", "de").pat)
     assert_equal("[abct-z&&[^n-u]]", Chars.new(["abc", "t".."z"],  
"n".."u").pat)
     assert_equal("[[:alnum:]]", Chars.new(:alnum).pat)
   end

   def test_index
     assert_equal(3, Rex.new("a").index("bcda"))
     assert_equal(3, Lit.new("a").index("bcda"))
   end

   def test_each
     pat = Lit.new('a').n(1)
     s = "aababbaaababb"
     result = []
     pat.each(s) {|md|
       result << md[0]
     }
     assert_equal(["aa", "a", "aaa", "a"], result)
   end
end