On 10/12/07, Ruby Quiz <james / grayproductions.net> wrote:
> by Benjohn Barnes
>
> Usually, you write a regular expression and then search for it in a text string.
> How about doing the opposite? Given a regular expression, generate all of the
> strings that it will match.

Hi, this is my solution. For the design I took the following decisions:

- I will limit the number of repetitions provided by * and + to a
constant. In my examples I've set it to 2

- I will limit the set of characters generated by . and [^...]
constructs to a fixed list of chars. For my tests I used CHARS = %w{a
b c d e}, but also tested with
#CHARS = [("a".."z").to_a, ("A".."Z").to_a, ".", ",", ";"].flatten
but the results can be big !!

I have the concept of repeaters and groups. A repeteater is a class
that knows how to repeat the result of the groups it contains. I've
implemented OneTimeRepeater for the groups without repetition
modifier, StarRepeater, PlusRepeater and QuestionMarkRepeater for the
appropriate modifiers. RangeRepeater takes care of the {a,b} modifier.

Groups: a group represents a piece of the string for which we can
geneate a set of possible values. The groups I have implemented are
SingleChar, CharGroup (don't support ranges yet, just a list of chars)
and Dot. I haven't implemented escaped characters like \s, \d, etc,
but I think they are trivial to implement. I have a couple of special
groups to cope with parens and |. MultiGroup takes care of the parens
nesting by combining the result of the contained repeaters, and
OrGroup adds together the result of the contained groups.

Things I don't support:
- Ranges in char groups [a-zA-Z]
- non-greedy repetitions: I don't even know how to do this, cause I
think you have to take into account what you are going to generate
after the group
- Character classes: \s, \d, etc and [:alpha:], etc, but I think they
are easy to implement
- Backreferences: /(abc)\1/, didn't even thought about them, but
probably a nightmare.
- Other things I don't even know :-).

Most of the interesting stuff is in the parse and combine methods. The
first one understands the syntax and creates groups and repeaters.
Uses recursive calls for parens and |. The combine method is able to
combine an array of arrays to produce all the possible combinations:

[["", "a"], ["b", "bb"]] --> ["b", "bb", "ab", "abb"]

Here it is:

# Number of times to repeat for Star and Plus repeaters
TIMES = 2

# Set of chars for Dot and negated [^] char groups
#CHARS = [("a".."z").to_a, ("A".."Z").to_a, ".", ",", ";"].flatten
CHARS = %w{a b c d e}

class OneTimeRepeater
  def initialize(group)
    @group = group
  end

  def result
    @group.result
  end
end

class StarRepeater
  def initialize(group)
    @group = group
  end

  def result
    r = []
    group_res = @group.result
    group_res.unshift("")
    TIMES.times do
      r << group_res
    end
    combine(r).uniq
  end
end

class PlusRepeater
  def initialize(group)
    @group = group
  end

  def result
    group_res = @group.result
    r = [group_res]
    temp = [""].concat(group_res)
    (TIMES - 1).times do
      r << temp
    end
    combine(r).uniq
  end
end

class QuestionMarkRepeater
  def initialize(group)
    @group = group
  end

  def result
    @group.result.unshift("")
  end
end

class RangeRepeater
  def initialize(group,min,max)
    @group = group
    @min = min
    @max = max
  end

  def result
    result = @group.result
    r = []
    r << [""] if @min == 0
    @min.times {r << result}
    temp = result.dup.unshift("")
    (@max - @min).times do
      r << temp
    end
    combine(r).uniq
  end
end

class SingleChar
  def initialize(c)
    @c = c
  end
  def result
    [@c]
  end
end

# TODO: support ranges [a-zA-Z]
class CharGroup
  def initialize(chars)
    @negative = chars[0] == "^"
    @chars = chars
    @chars = @chars[1..-1] if @negative
  end
  def result
    if @negative
      CHARS - @chars
    else
      @chars
    end
  end
end

class Dot
  def result
    CHARS
  end
end

class MultiGroup
  def initialize(groups)
    @groups = groups
  end

  def result
    strings = @groups.map {|x| x.result}
    combine(strings)
  end
end

class OrGroup
  def initialize(first_groupset, second_groupset)
    @first = first_groupset
    @second = second_groupset
  end

  def result
    strings = @first.map {|x| x.result}
    s = combine(strings)
    strings = @second.map {|x| x.result}
    s.concat(combine(strings))
  end
end



# Combines arrays, calling + on each possible pair
# Starts from the first two arrays, then goes on
# combining another array to the result
def combine(arrays)
  string = arrays.inject do |r, rep|
    temp = []
    r.each {|aa| rep.each {|bb| temp << (aa + bb)}}
    temp
  end
  string
end


def parse(s, i = 0)
  repeaters = []
  group = nil
  while i < s.length
    char = s[i].chr
    case char
    when '('
      groups, i = parse(s, i+1)
      group = MultiGroup.new(groups)
    when ')'
      return repeaters,i
    when '['
      chars = []
      i += 1
      until s[i].chr == ']'
        chars << s[i].chr
        i += 1
      end
      group = CharGroup.new(chars)
    when '.'
      group = Dot.new
    when '|'
      groups, i = parse(s, i + 1)
      group = OrGroup.new(repeaters, groups)
      return [group], i
    else
      group = SingleChar.new(char)
    end

    repeater = nil
    i += 1
    if i < s.length
      case s[i].chr
      when '*'
        repeater = StarRepeater.new(group)
      when '+'
        repeater = PlusRepeater.new(group)
      when '?'
        repeater = QuestionMarkRepeater.new(group)
      when '{'
        first = ""
        i += 1
        while s[i].chr != ","
          first << s[i].chr
          i += 1
        end
        i += 1
        second = ""
        while s[i].chr != "}"
          second << s[i].chr
          i += 1
        end
        repeater = RangeRepeater.new(group, first.to_i, second.to_i)
      else
        repeater = OneTimeRepeater.new(group)
        i -= 1
      end
      i += 1
    else
      repeater = OneTimeRepeater.new(group)
    end
    repeaters << repeater
  end
  return repeaters, i
end

class Regexp
  def generate
    r = self.inspect[1..-2]
    repeaters, _ = parse(r)
    strings = repeaters.map {|x| x.result}
    s = combine(strings)
    s
  end
end

def show(regexp)
  s = regexp.generate
  puts "#{regexp.inspect} --> #{s.inspect}"
  #puts "Checking..."
  #errors = s.reject {|string| string =~ regexp}
  #if errors.size == 0
  #  puts "All strings match"
  #else
  #  puts "These don't match: #{errors.inspect}"
  #end
  #
end

Some tests with TIMES=2 and CHARS = %w{a b c d e}:

show(/ab+[def]?[ghi]j/)
show(/ab*c+/)
show(/ab(c+(de)*)?f/)
show(/a{0,3}/)
show(/a|b|c/)
show(/ab(c)+|xy*|jjjk+[^jk]/)
show(/(lovely|delicious|splendid)(food|snacks|munchies)/)

/ab+[def]?[ghi]j/ --> ["abgj", "abhj", "abij", "abdgj", "abdhj",
"abdij", "abegj", "abehj", "abeij", "abfgj", "abfhj", "abfij",
"abbgj", "abbhj", "abbij", "abbdgj", "abbdhj", "abbdij", "abbegj",
"abbehj", "abbeij", "abbfgj", "abbfhj", "abbfij"]

/ab*c+/ --> ["ac", "acc", "abc", "abcc", "abbc", "abbcc"]

/ab(c+(de)*)?f/ --> ["abf", "abcf", "abcdef", "abcdedef", "abccf",
"abccdef", "abccdedef"]

/a{0,3}/ --> ["", "a", "aa", "aaa"]

/a|b|c/ --> ["a", "b", "c"]

/ab(c)+|xy*|jjjk+[^jk]/ --> ["abc", "abcc", "x", "xy", "xyy", "jjjka",
"jjjkb", "jjjkc", "jjjkd", "jjjke", "jjjkka", "jjjkkb", "jjjkkc",
"jjjkkd", "jjjkke"]

/(lovely|delicious|splendid)(food|snacks|munchies)/ --> ["lovelyfood",
"lovelysnacks", "lovelymunchies", "deliciousfood", "delicioussnacks",
"deliciousmunchies", "splendidfood", "splendidsnacks",
"splendidmunchies"]

It was fun.

Regards,

Jesus.