On Fri, Jun 27, 2008 at 5:56 PM, Matthew Moss <matthew.moss / gmail.com> wrote:

> ## Statistician I (#167)
>
> This week begins a three-part quiz, the final goal to provide a little
> library for parsing and analyzing line-based data. Hopefully, each portion
> of the larger problem is interesting enough on its own, without being too
> difficult to attempt. The first part -- this week's quiz -- will focus on
> the pattern matching.
>
> Let's look at a bit of example input:
>
>    You wound Perl for 15 points of Readability damage.
>    You wound Perl with Metaprogramming for 23 points of Usability damage.
>    Your mighty blow defeated Perl.
>    C++ walks into the arena.
>    C++ wounds you with Compiled Code for 37 points of Speed damage.
>    You wound C++ for 52 points of Usability damage.
>
> Okay, it's silly, but it is similar to a much larger data file I'll provide
> end for testing.
>
> You should definitely note the repetitiveness: just the sort of thing that
> we can automate. In fact, I've examined the input above and created three
> rules (a.k.a. patterns) that match (most of) the data:
>
>    [The ]<name> wounds you[ with <attack>] for <amount> point[s] of <kind>[
> damage].
>    You wound[ the] <name>[ with <attack>] for <amount> point[s] of <kind>[
> damage].
>    Your mighty blow defeated[ the] <name>.
>
> There are a few guidelines about these rules:
>
> 1. Text contained within square brackets is optional.
> 2. A word contained in angle brackets represents a field; not a literal
> match, but data to be remembered.
> 3. Fields are valid within optional portions.
> 4. You may assume that both the rules and the input lines are stripped of
> excess whitespace on both ends.
>
> Assuming the rules are in `rules.txt` and the input is in `data.txt`,
> running your Ruby script as such:
>
>    > ruby reporter.rb rules.txt data.txt
>
> Should generate the following output:
>
>    Rule 1: Perl, 15, Readability
>    Rule 1: Perl, Metaprogramming, 23, Usability
>    Rule 2: Perl
>    # No Match
>    Rule 0: C++, Compiled Code, 37, Speed
>    Rule 1: C++, 52, Usability
>
>    Unmatched input:
>    C++ walks into the arena.
>

Hi,

This is my try at this quiz. I thought it would be cool to store the
field "names" too, for each match.
I also added a verbose output to show the field name and the value. As
the goal was to be flexible too,
I made some classes to encapsulate everything, to prepare for the future:

class Match
attr_accessor :captures, :mappings, :rule

def initialize captures, mappings, rule
@captures = captures
@mappings = mappings
@rule = rule
end

def to_s verbose=false
s = "Rule #{@rule.id}: "
if verbose
@rule.names.each_with_index {|n,i| s << "[#{n} => #{@mappings[n]}]"
if @captures[i]}
s
else
s + "#{@captures.compact.join(",")}"
end
end
end

class Rule
attr_accessor :names, :id

# Translate rules to regexps, specifying if the first captured group
# has to be remembered
RULE_MAPPINGS = {
"[" => ["(?:", false],
"]" => [")?", false],
/<(.*?)>/ => ["(.*?)", true],
}
def initialize id, line
@id = id
@names = []
escaped = escape(line)
reg = RULE_MAPPINGS.inject(escaped) do |line, (tag, value)|
replace, remember = *value
line.gsub(tag) do |m|
@names << \$1 if remember
replace
end
end
@reg = Regexp.new(reg)
end

def escape line
# From the mappings, change the regexp sensitive chars with non-sensitive ones
# so that we can Regexp.escape the line, then sub them back
escaped = line.gsub("[", "____").gsub("]", "_____")
escaped = Regexp.escape(escaped)
escaped.gsub("_____", "]").gsub("____", "[")
end

def match data
m = @reg.match data
return nil unless m
map = Hash[*@names.zip(m.captures).flatten]
Match.new m.captures, map, self
end
end

class RuleSet
def initialize file
@rules = []
File.open(file) do |f|
f.each_with_index {|line, i| @rules << Rule.new(i, line.chomp)}
end
p @rules
end

def apply data
match = nil
@rules.find {|r| match = r.match data}
match
end
end

rules_file = ARGV[0] || "rules.txt"
data_file = ARGV[1] || "data.txt"

rule_set = RuleSet.new rules_file

matches = nil
unmatched = []
File.open(data_file) do |f|
matches = f.map do |line|
m = rule_set.apply line.chomp
unmatched << line unless m
m
end
end

matches.each do |m|
if m
puts m
else
puts "#No match"
end
end

unless unmatched.empty?
puts "Unmatched input: "
puts unmatched
end

#~ puts "Verbose output:"
#~ matches.each do |m|
#~ if m
#~ puts (m.to_s(true))
#~ else
#~ puts "#No match"
#~ end
#~ end