On Fri, Jun 27, 2008 at 5:56 PM, Matthew Moss <matthew.moss / gmail.com> wrote:

> ## Statistician I (#167)
>
> This week begins a three-part quiz, the final goal to provide a little
> library for parsing and analyzing line-based data. Hopefully, each portion
> of the larger problem is interesting enough on its own, without being too
> difficult to attempt. The first part -- this week's quiz -- will focus on
> the pattern matching.
>
> Let's look at a bit of example input:
>
>    You wound Perl for 15 points of Readability damage.
>    You wound Perl with Metaprogramming for 23 points of Usability damage.
>    Your mighty blow defeated Perl.
>    C++ walks into the arena.
>    C++ wounds you with Compiled Code for 37 points of Speed damage.
>    You wound C++ for 52 points of Usability damage.
>
> Okay, it's silly, but it is similar to a much larger data file I'll provide
> end for testing.
>
> You should definitely note the repetitiveness: just the sort of thing that
> we can automate. In fact, I've examined the input above and created three
> rules (a.k.a. patterns) that match (most of) the data:
>
>    [The ]<name> wounds you[ with <attack>] for <amount> point[s] of <kind>[
> damage].
>    You wound[ the] <name>[ with <attack>] for <amount> point[s] of <kind>[
> damage].
>    Your mighty blow defeated[ the] <name>.
>
> There are a few guidelines about these rules:
>
> 1. Text contained within square brackets is optional.
> 2. A word contained in angle brackets represents a field; not a literal
> match, but data to be remembered.
> 3. Fields are valid within optional portions.
> 4. You may assume that both the rules and the input lines are stripped of
> excess whitespace on both ends.
>
> Assuming the rules are in `rules.txt` and the input is in `data.txt`,
> running your Ruby script as such:
>
>    > ruby reporter.rb rules.txt data.txt
>
> Should generate the following output:
>
>    Rule 1: Perl, 15, Readability
>    Rule 1: Perl, Metaprogramming, 23, Usability
>    Rule 2: Perl
>    # No Match
>    Rule 0: C++, Compiled Code, 37, Speed
>    Rule 1: C++, 52, Usability
>
>    Unmatched input:
>    C++ walks into the arena.
>

Hi,

This is my try at this quiz. I thought it would be cool to store the
field "names" too, for each match.
I also added a verbose output to show the field name and the value. As
the goal was to be flexible too,
I made some classes to encapsulate everything, to prepare for the future:

class Match
	attr_accessor :captures, :mappings, :rule
	
	def initialize captures, mappings, rule
		@captures = captures
		@mappings = mappings
		@rule = rule
	end

	def to_s verbose=false
		s = "Rule #{@rule.id}: "
		if verbose
			@rule.names.each_with_index {|n,i| s << "[#{n} => #{@mappings[n]}]"
if @captures[i]}
			s
		else
			s + "#{@captures.compact.join(",")}"
		end
	end
end

class Rule
	attr_accessor :names, :id
	
        # Translate rules to regexps, specifying if the first captured group
        # has to be remembered
	RULE_MAPPINGS = {
		"[" => ["(?:", false],
		"]" => [")?", false],
		/<(.*?)>/ => ["(.*?)", true],
	}
	def initialize id, line
		@id = id
		@names = []
		escaped = escape(line)
		reg = RULE_MAPPINGS.inject(escaped) do |line, (tag, value)|
			replace, remember = *value
			line.gsub(tag) do |m|
				@names << $1 if remember
				replace
			end
		end
		@reg = Regexp.new(reg)
	end
	
	def escape line
		# From the mappings, change the regexp sensitive chars with non-sensitive ones
		# so that we can Regexp.escape the line, then sub them back
		escaped = line.gsub("[", "____").gsub("]", "_____")
		escaped = Regexp.escape(escaped)
		escaped.gsub("_____", "]").gsub("____", "[")
	end
	
	def match data
		m = @reg.match data
		return nil unless m
		map = Hash[*@names.zip(m.captures).flatten]
		Match.new m.captures, map, self
	end
end

class RuleSet
	def initialize file
		@rules = []
		File.open(file) do |f|
			f.each_with_index {|line, i| @rules << Rule.new(i, line.chomp)}
		end
		p @rules
	end
	
	def apply data
		match = nil
		@rules.find {|r| match = r.match data}
		match
	end
end

rules_file = ARGV[0] || "rules.txt"
data_file = ARGV[1] || "data.txt"

rule_set = RuleSet.new rules_file

matches = nil
unmatched = []
File.open(data_file) do |f|
	matches = f.map do |line|
		m = rule_set.apply line.chomp
		unmatched << line unless m
		m
	end
end

matches.each do |m|
	if m
		puts m
	else
		puts "#No match"
	end
end

unless unmatched.empty?
	puts "Unmatched input: "
	puts unmatched
end

#~ puts "Verbose output:"
#~ matches.each do |m|
	#~ if m
		#~ puts (m.to_s(true))
	#~ else
		#~ puts "#No match"
	#~ end
#~ end