On Fri, Jun 27, 2008 at 5:56 PM, Matthew Moss <matthew.moss / gmail.com> wrote: > ## Statistician I (#167) > > This week begins a three-part quiz, the final goal to provide a little > library for parsing and analyzing line-based data. Hopefully, each portion > of the larger problem is interesting enough on its own, without being too > difficult to attempt. The first part -- this week's quiz -- will focus on > the pattern matching. > > Let's look at a bit of example input: > > You wound Perl for 15 points of Readability damage. > You wound Perl with Metaprogramming for 23 points of Usability damage. > Your mighty blow defeated Perl. > C++ walks into the arena. > C++ wounds you with Compiled Code for 37 points of Speed damage. > You wound C++ for 52 points of Usability damage. > > Okay, it's silly, but it is similar to a much larger data file I'll provide > end for testing. > > You should definitely note the repetitiveness: just the sort of thing that > we can automate. In fact, I've examined the input above and created three > rules (a.k.a. patterns) that match (most of) the data: > > [The ]<name> wounds you[ with <attack>] for <amount> point[s] of <kind>[ > damage]. > You wound[ the] <name>[ with <attack>] for <amount> point[s] of <kind>[ > damage]. > Your mighty blow defeated[ the] <name>. > > There are a few guidelines about these rules: > > 1. Text contained within square brackets is optional. > 2. A word contained in angle brackets represents a field; not a literal > match, but data to be remembered. > 3. Fields are valid within optional portions. > 4. You may assume that both the rules and the input lines are stripped of > excess whitespace on both ends. > > Assuming the rules are in `rules.txt` and the input is in `data.txt`, > running your Ruby script as such: > > > ruby reporter.rb rules.txt data.txt > > Should generate the following output: > > Rule 1: Perl, 15, Readability > Rule 1: Perl, Metaprogramming, 23, Usability > Rule 2: Perl > # No Match > Rule 0: C++, Compiled Code, 37, Speed > Rule 1: C++, 52, Usability > > Unmatched input: > C++ walks into the arena. > Hi, This is my try at this quiz. I thought it would be cool to store the field "names" too, for each match. I also added a verbose output to show the field name and the value. As the goal was to be flexible too, I made some classes to encapsulate everything, to prepare for the future: class Match attr_accessor :captures, :mappings, :rule def initialize captures, mappings, rule @captures = captures @mappings = mappings @rule = rule end def to_s verbose=false s = "Rule #{@rule.id}: " if verbose @rule.names.each_with_index {|n,i| s << "[#{n} => #{@mappings[n]}]" if @captures[i]} s else s + "#{@captures.compact.join(",")}" end end end class Rule attr_accessor :names, :id # Translate rules to regexps, specifying if the first captured group # has to be remembered RULE_MAPPINGS = { "[" => ["(?:", false], "]" => [")?", false], /<(.*?)>/ => ["(.*?)", true], } def initialize id, line @id = id @names = [] escaped = escape(line) reg = RULE_MAPPINGS.inject(escaped) do |line, (tag, value)| replace, remember = *value line.gsub(tag) do |m| @names << $1 if remember replace end end @reg = Regexp.new(reg) end def escape line # From the mappings, change the regexp sensitive chars with non-sensitive ones # so that we can Regexp.escape the line, then sub them back escaped = line.gsub("[", "____").gsub("]", "_____") escaped = Regexp.escape(escaped) escaped.gsub("_____", "]").gsub("____", "[") end def match data m = @reg.match data return nil unless m map = Hash[*@names.zip(m.captures).flatten] Match.new m.captures, map, self end end class RuleSet def initialize file @rules = [] File.open(file) do |f| f.each_with_index {|line, i| @rules << Rule.new(i, line.chomp)} end p @rules end def apply data match = nil @rules.find {|r| match = r.match data} match end end rules_file = ARGV[0] || "rules.txt" data_file = ARGV[1] || "data.txt" rule_set = RuleSet.new rules_file matches = nil unmatched = [] File.open(data_file) do |f| matches = f.map do |line| m = rule_set.apply line.chomp unmatched << line unless m m end end matches.each do |m| if m puts m else puts "#No match" end end unless unmatched.empty? puts "Unmatched input: " puts unmatched end #~ puts "Verbose output:" #~ matches.each do |m| #~ if m #~ puts (m.to_s(true)) #~ else #~ puts "#No match" #~ end #~ end