--N7cXnGaqHsTLxonszcP
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

Here's my solution. Theres a bit more speed to be had out of it, but
I'll be in trouble if I spend any more time on it, so I'll just post it
up instead ;)

It's slower than some of the other solutions, but does seem to work. I
ran the attached benchmark to gauge performance against a regexp (using
Oniguruma under 1.9) and it's slow to build, but fast to match (I'm
lazy, using hashes for my tree):

$ ruby9 bench.rb
### CREATION (x10) ###
      user     system      total        real
regexp        1.430000   0.020000   1.450000 (  1.518385)
tree matcher  15.290000   0.070000  15.360000 ( 15.727852)
### MATCHING (x500) ###
      user     system      total        real
regexp         0.830000   0.000000   0.830000 (  0.865269)
tree matcher   0.010000   0.000000   0.010000 (  0.016855)

I also ran Ken's posted benchmark, under both 1.8 and (for fun) 1.9. My
solution suffers slightly, but check out the speed increase in Ken's own
solution:

$ ruby -v kbbench.rb
ruby 1.8.5 (2006-08-25) [i686-linux]
                          user     system      total        real
Edwin Fine -- fill    0.480000   0.020000   0.500000 (  0.536963)
Edwin Fine -- test    3.310000   0.010000   3.320000 (  3.383446)
Ken Bloom -- fill     2.460000   0.070000   2.530000 (  2.569271)
Ken Bloom -- test    13.300000   0.030000  13.330000 ( 13.830023)
Ross Bamford -- fill  1.570000   0.020000   1.590000 (  1.635048)
Ross Bamford -- test  5.470000   0.030000   5.500000 (  5.595892)

$ ruby9 -v kbbench.rb
ruby 1.9.0 (2006-11-26 patchlevel 0) [i686-linux]
                          user     system      total        real
Edwin Fine -- fill    0.460000   0.010000   0.470000 (  0.530037)
Edwin Fine -- test    2.990000   0.000000   2.990000 (  3.947945)
Ken Bloom -- fill     3.070000   0.060000   3.130000 (  3.273754)
Ken Bloom -- test     3.110000   0.010000   3.120000 (  3.184959)
Ross Bamford -- fill  1.270000   0.000000   1.270000 (  1.314763)
Ross Bamford -- test  6.070000   0.010000   6.080000 (  6.164216)

Thanks for a fun quiz. Now I'd better get some work done... :)

-- 
Ross Bamford - rosco / roscopeco.REMOVE.co.uk

--N7cXnGaqHsTLxonszcP
Content-Disposition: attachment; filename=bench.rb
Content-Type: application/x-ruby; name=bench.rb
Content-Transfer-Encoding: 7bit

require 'dm'
require 'benchmark'

words  ile.read('words_en.txt').split.map { |s| s.split(' ') }.flatten
str  ile.read('practical-file-system-design.txt')

rxp, ndm, dm  il, nil, nil
iters  0

puts "### CREATION (x#{iters}) ###"
Benchmark.bm do |x|
  if RUBY_VERSION > 1.9"
    x.report('regexp      ') do
      iters.times { rxp  egexp.new(words.map { |w| Regexp.escape(w) }.join('|')) }
    end
  end
  x.report('tree matcher ') do
    iters.times { dm  ictionaryMatcher.new(*words) }
  end 
end

iters  00

puts "### MATCHING (x#{iters}) ###"
Benchmark.bm do |x|
  if RUBY_VERSION > 1.9"
    x.report('regexp       ') do
      iters.times { rxp str }
    end
  end 
  x.report('tree matcher ') do
    iters.times { dm str }
  end 
end


--N7cXnGaqHsTLxonszcP
Content-Disposition: attachment; filename=dm.rb
Content-Type: application/x-ruby; name=dm.rb
Content-Transfer-Encoding: base64

IyBUaGlzIGlzIGEgZGljdGlvbmFyeSBtYXRjaGVyIHVzaW5nIGEgc2ltcGxlIHByZWZpeCB0cmVl
CiMgaW1wbGVtZW50YXRpb24uIEl0IGhhcyBhbGwgdGhlIGFkZGl0aW9uYWwgZ3ViYmlucywgdG9v
LCBhbmQgaXMKIyBwcmV0dHkgcXVpY2sgd2hlbiBpdCBjb21lcyB0byBsYXJnZSBpbnB1dCBzZXRz
LgpjbGFzcyBEaWN0aW9uYXJ5TWF0Y2hlcgogIGNsYXNzIE1hdGNoRGF0YQogICAgZGVmIGluaXRp
YWxpemUoc3RhcnRfb2Zmc2V0LCBlbmRfb2Zmc2V0LCBtYXRjaCwgc3RyaW5nKQogICAgICBAc3Rh
cnRfb2Zmc2V0LCBAZW5kX29mZnNldCwgQG1hdGNoLCBAc3RyaW5nID0gCiAgICAgICAgICBzdGFy
dF9vZmZzZXQsIGVuZF9vZmZzZXQsIG1hdGNoLCBzdHJpbmcKICAgICAgCiAgICAgIEBvZmZzZXQg
PSBbc3RhcnRfb2Zmc2V0LCBlbmRfb2Zmc2V0XQogICAgICBAbGVuZ3RoID0gKGVuZF9vZmZzZXQg
LSBzdGFydF9vZmZzZXQpICsgMQogICAgZW5kCgogICAgYXR0cl9yZWFkZXIgOnN0YXJ0X29mZnNl
dCwgOmVuZF9vZmZzZXQsIDptYXRjaCwgOnN0cmluZywgOm9mZnNldCwgOmxlbmd0aAogICAgICAK
ICAgIGRlZiBwcmVfbWF0Y2gKICAgICAgQHByZV9tYXRjaCB8fD0gQHN0cmluZ1swLi4uQHN0YXJ0
X29mZnNldF0KICAgIGVuZAoKICAgIGRlZiBwb3N0X21hdGNoCiAgICAgIEBwb3N0X21hdGNoIHx8
PSBAc3RyaW5nWyhAZW5kX29mZnNldCArIDEpLi4tMV0KICAgIGVuZAoKICAgIGRlZiBjaGFyX2Nv
dW50CiAgICAgIEBjaGFyX2NvdW50IHx8PSBAbWF0Y2guc2NhbigvLi9tdSkubGVuZ3RoCiAgICBl
bmQKCiAgICBkZWYgdG9fcwogICAgICBAbWF0Y2gKICAgIGVuZAogICAgCiAgICBkZWYgaW5zcGVj
dAogICAgICAiIzxNYXRjaDoje0BzdGFydF9vZmZzZXR9OiAje3RvX3MuaW5zcGVjdH0+IgogICAg
ZW5kCgogICAgZGVmID09KG8pCiAgICAgIGlmIG8ucmVzcG9uZF90byg6dG9fc3RyKQogICAgICAg
IEBtYXRjaCA9PSBvCiAgICAgIGVsc2UKICAgICAgICBzdXBlcgogICAgICBlbmQKICAgIGVuZAog
IGVuZAoKICBjbGFzcyBFbnVtZXJhYmxlTWF0Y2hlcgogICAgaW5jbHVkZSBFbnVtZXJhYmxlCiAg
ICAKICAgIGRlZiBpbml0aWFsaXplKGRtLCBzdHIpCiAgICAgIEBkbSwgQHN0ciA9IGRtLCBzdHIK
ICAgIGVuZAoKICAgIGRlZiBlYWNoCiAgICAgIHN0ciwgb2ZzID0gQHN0ciwgMAoKICAgICAgd2hp
bGUgbWQgPSBAZG0ubWF0Y2goc3RyLCBvZnMpCiAgICAgICAgeWllbGQgbWQKICAgICAgICBvZnMg
PSBtZC5zdGFydF9vZmZzZXQgKyBtZC5sZW5ndGggKyAxIAogICAgICBlbmQKCiAgICAgIHNlbGYK
ICAgIGVuZAogIGVuZAoKICBkZWYgaW5pdGlhbGl6ZSgqd29yZHMpICAgIAogICAgQHB0ID0ge30K
ICAgIHdvcmRzLmVhY2ggeyB8d29yZHwgc2VsZiA8PCB3b3JkIH0KICBlbmQKCiAgZGVmIDw8KHdv
cmQpCiAgICAjIHNtYWxsIG1lbW9yeSBvcHRpbWl6YXRpb24gLSBpZiB0aGVyZSdzIGEgbG9uZ2Vy
IHdvcmQgdGhhdCBzaGFyZXMKICAgICMgdGhpcyBwcmVmaXgsIHdlIGNhbiBkaXNjYXJkIGl0IHNp
bmNlIHdlJ2xsIG9ubHkgZXZlciB0YWtlIHRoZSAKICAgICMgc2hvcnRlc3QgbWF0Y2ggYW55d2F5
LgogICAgd29yZC5zcGxpdCgnJykuaW5qZWN0KEBwdCkgZG8gfHB0LCBjaHJ8IAogICAgICBwdFtj
aHJdIHx8PSB7fSAKICAgIGVuZC5jbGVhcls6X19XT1JEX19dID0gdHJ1ZQoKICAgIHNlbGYKICBl
bmQKCiAgZGVmIGluY2x1ZGU/KHdvcmQpCiAgICBpZiBtZCA9IHNlbGYubWF0Y2god29yZCkKICAg
ICAgbWQubWF0Y2ggPT0gd29yZAogICAgZWxzZQogICAgICBmYWxzZQogICAgZW5kCiAgZW5kCgog
IGRlZiB0b19lbnVtKHN0cikKICAgIEVudW1lcmFibGVNYXRjaGVyLm5ldyhzZWxmLCBzdHIpCiAg
ZW5kCgogIGRlZiBzY2FuKHN0ciwgJmJsaykKICAgIGlmIGJsawogICAgICB0b19lbnVtKHN0ciku
ZWFjaCgmYmxrKQogICAgICBzdHIKICAgIGVsc2UKICAgICAgYSA9IFtdCiAgICAgIHRvX2VudW0o
c3RyKS5lYWNoIHsgfG1kfCBhIDw8IG1kLm1hdGNoIH0KICAgICAgYQogICAgZW5kCiAgZW5kCiAg
CiAgZGVmIG1hdGNoKHN0ciwgc3RhcnRfb2ZzID0gMCkKICAgIHN0YXJ0X29mcy51cHRvKHN0ci5s
ZW5ndGgpIGRvIHxpfAogICAgICB3b3JkID0gIiIKICAgICAgbmV4dF9wdCA9IEBwdAogICAgICBz
aSA9IGkKICAgICAgd2hpbGUgbmV4dF9wdCA9IG5leHRfcHRbY2hyID0gc3RyW2ksMV1dCiAgICAg
ICAgd29yZCA8PCBjaHIgICAgICAgIAogICAgICAgIHJldHVybiBNYXRjaERhdGEubmV3KHNpLCBp
LCB3b3JkLCBzdHIpIGlmIG5leHRfcHRbOl9fV09SRF9fXQogICAgICAgIGkrPTEKICAgICAgZW5k
CiAgICBlbmQKCiAgICBuaWwKICBlbmQKCiAgZGVmID1+KHN0cikKICAgIG0gPSBtYXRjaChzdHIp
IGFuZCBtLnN0YXJ0X29mZnNldAogIGVuZAogICAgICAgCiAgZGVmIGluc3BlY3QKICAgIEBwdC5p
bnNwZWN0CiAgZW5kCmVuZAoKUm9zc0JhbWZvcmREaWN0aW9uYXJ5TWF0Y2hlciA9IERpY3Rpb25h
cnlNYXRjaGVyCgppZiAkMCA9PSBfX0ZJTEVfXwogIHJlcXVpcmUgJ3Rlc3QvdW5pdCcKCiAgY2xh
c3MgVENfRE1fMDEgPCBUZXN0OjpVbml0OjpUZXN0Q2FzZQogICAgZGVmIHNldHVwCiAgICAgIEBk
bSA9IERpY3Rpb25hcnlNYXRjaGVyLm5ldwogICAgICAKICAgICAgQGRtIDw8ICJzdHJpbmciCiAg
ICAgIEBkbSA8PCAiUnVieSIKICAgIGVuZAogICAgCiAgICBkZWYgdGVzdF9xdWl6X2luY2x1ZGU/
CiAgICAgIGFzc2VydCBAZG0uaW5jbHVkZT8oIlJ1YnkiKQogICAgICBhc3NlcnQgIUBkbS5pbmNs
dWRlPygibWlzc2luZyIpCiAgICAgIGFzc2VydCAhQGRtLmluY2x1ZGU/KCJzdHJpbmdpbmcgeW91
IGFsb25nIikKICAgIGVuZAoKICAgIGRlZiB0ZXN0X3F1aXpfbWF0Y2hvcAogICAgICBhc3NlcnRf
ZXF1YWwgNSwgQGRtID1+ICJsb25nIHN0cmluZyIKICAgICAgYXNzZXJ0X25pbCBAZG0gPX4gInJ1
YiB5b3UgdGhlIHdyb25nIHdheSIKICAgICAgCiAgICAgIGFzc2VydF9lcXVhbCA1LCBAZG0gPX4g
ImxvbmcgUnVieSBzdHJpbmciCiAgICAgIGFzc2VydF9uaWwgQGRtID1+ICJsb25nIHJ1Ynkgc3Ry
IgoKICAgICAgYXNzZXJ0X2VxdWFsIDUsICdsb25nX3N0cmluZycgPX4gQGRtCiAgICBlbmQKCiAg
ICBkZWYgdGVzdF9tYXRjaF9kYXRhXzAxCiAgICAgIGFzc2VydF9uaWwgQGRtLm1hdGNoKCdydWIg
eW91IHRoZSB3cm9uZyB3YXknKQogICAgZW5kCgogICAgZGVmIHRlc3RfbWF0Y2hfZGF0YV8wMgog
ICAgICBhc3NlcnRfaW5zdGFuY2Vfb2YgRGljdGlvbmFyeU1hdGNoZXI6Ok1hdGNoRGF0YSwgbWQg
PSBAZG0ubWF0Y2goJ2xvbmcgc3RyaW5nJykKICAgICAgCiAgICAgIGFzc2VydF9lcXVhbCA1LCBt
ZC5zdGFydF9vZmZzZXQKICAgICAgYXNzZXJ0X2VxdWFsIDEwLCBtZC5lbmRfb2Zmc2V0CiAgICAg
IGFzc2VydF9lcXVhbCBbNSwxMF0sIG1kLm9mZnNldAogICAgICBhc3NlcnRfZXF1YWwgImxvbmcg
IiwgbWQucHJlX21hdGNoCiAgICAgIGFzc2VydF9lcXVhbCAic3RyaW5nIiwgbWQubWF0Y2gKICAg
ICAgYXNzZXJ0X2VxdWFsICIiLCBtZC5wb3N0X21hdGNoCiAgICBlbmQKICAgIAogICAgZGVmIHRl
c3RfbWF0Y2hfZGF0YV8wMwogICAgICBhc3NlcnRfaW5zdGFuY2Vfb2YgRGljdGlvbmFyeU1hdGNo
ZXI6Ok1hdGNoRGF0YSwgbWQgPSBAZG0ubWF0Y2goJ2xvbmcgc3RyaW5nIHRvbycpCiAgICAgIAog
ICAgICBhc3NlcnRfZXF1YWwgNSwgbWQuc3RhcnRfb2Zmc2V0CiAgICAgIGFzc2VydF9lcXVhbCAx
MCwgbWQuZW5kX29mZnNldAogICAgICBhc3NlcnRfZXF1YWwgWzUsMTBdLCBtZC5vZmZzZXQKICAg
ICAgYXNzZXJ0X2VxdWFsICJsb25nICIsIG1kLnByZV9tYXRjaAogICAgICBhc3NlcnRfZXF1YWwg
InN0cmluZyIsIG1kLm1hdGNoCiAgICAgIGFzc2VydF9lcXVhbCAiIHRvbyIsIG1kLnBvc3RfbWF0
Y2gKICAgIGVuZAogIGVuZAogIAogIGNsYXNzIFRDX0RNXzAyIDwgVGVzdDo6VW5pdDo6VGVzdENh
c2UKICAgIGRlZiBzZXR1cAogICAgICBAZG0gPSBEaWN0aW9uYXJ5TWF0Y2hlci5uZXcKICAgICAg
CiAgICAgIEBkbSA8PCAianVzdCIKICAgICAgQGRtIDw8ICJkYW0iCiAgICAgIEBkbSA8PCAiZGFt
YWdlIgogICAgZW5kCgogICAgZGVmIHRlc3RfYWx3YXlzX2ZpbmRzX2ZpcnN0X21hdGNoCiAgICAg
IGFzc2VydF9pbnN0YW5jZV9vZiBEaWN0aW9uYXJ5TWF0Y2hlcjo6TWF0Y2hEYXRhLCBtZCA9IEBk
bS5tYXRjaCgnanVzdCBkYW1hZ2VkJykKICAgICAgCiAgICAgIGFzc2VydF9lcXVhbCAwLCBtZC5z
dGFydF9vZmZzZXQKICAgICAgYXNzZXJ0X2VxdWFsIDMsIG1kLmVuZF9vZmZzZXQKICAgICAgYXNz
ZXJ0X2VxdWFsIFswLDNdLCBtZC5vZmZzZXQKICAgICAgYXNzZXJ0X2VxdWFsICIiLCBtZC5wcmVf
bWF0Y2gKICAgICAgYXNzZXJ0X2VxdWFsICJqdXN0IiwgbWQubWF0Y2gKICAgICAgYXNzZXJ0X2Vx
dWFsICIgZGFtYWdlZCIsIG1kLnBvc3RfbWF0Y2gKICAgIGVuZAogICAgCiAgICBkZWYgdGVzdF9h
bHdheXNfZmluZHNfc2hvcnRlc3RfbWF0Y2gKICAgICAgYXNzZXJ0X2luc3RhbmNlX29mIERpY3Rp
b25hcnlNYXRjaGVyOjpNYXRjaERhdGEsIG1kID0gQGRtLm1hdGNoKCd3aGVuIGRhbWFnZWQnKQog
ICAgICAKICAgICAgYXNzZXJ0X2VxdWFsIDUsIG1kLnN0YXJ0X29mZnNldAogICAgICBhc3NlcnRf
ZXF1YWwgNywgbWQuZW5kX29mZnNldAogICAgICBhc3NlcnRfZXF1YWwgWzUsN10sIG1kLm9mZnNl
dAogICAgICBhc3NlcnRfZXF1YWwgIndoZW4gIiwgbWQucHJlX21hdGNoCiAgICAgIGFzc2VydF9l
cXVhbCAiZGFtIiwgbWQubWF0Y2gKICAgICAgYXNzZXJ0X2VxdWFsICJhZ2VkIiwgbWQucG9zdF9t
YXRjaAogICAgZW5kCgogICAgZGVmIHRlc3Rfc2Nhbl8wMQogICAgICBAZG0gPDwgJ3doZW4nCiAg
ICAgIGFzc2VydF9lcXVhbCBbIndoZW4iLCAiZGFtIl0sIEBkbS5zY2FuKCdkbyBub3Qgb3BlbiB3
aGVuIGRhbWFnZWQnKQogICAgZW5kCiAgICAKICAgIGRlZiB0ZXN0X3NjYW5fMDIKICAgICAgQGRt
IDw8ICd3aGVuJwogICAgICBhID0gW10KICAgICAgc2NhbnIgPSBAZG0uc2NhbignZG8gbm90IG9w
ZW4gd2hlbiBkYW1hZ2VkJykgeyB8bWR8IGEgPDwgbWQubWF0Y2ggfQogICAgICBhc3NlcnRfZXF1
YWwgJ2RvIG5vdCBvcGVuIHdoZW4gZGFtYWdlZCcsIHNjYW5yCiAgICAgIGFzc2VydF9lcXVhbCBb
IndoZW4iLCAiZGFtIl0sIGEKICAgIGVuZAogICAgCiAgICBkZWYgdGVzdF90b19lbnVtXzAxCiAg
ICAgIEBkbSA8PCAnd2hlbicKICAgICAgZW51bSA9IEBkbS50b19lbnVtKCdkbyBub3Qgb3BlbiB3
aGVuIGRhbWFnZWQnKQogICAgICBhc3NlcnRfZXF1YWwgWyJ3aGVuIiwgImRhbSJdLCBlbnVtLmlu
amVjdChbXSkgeyB8YSxtZHwgYSA8PCBtZC5tYXRjaCB9CiAgICAgIGFzc2VydF9lcXVhbCBbIndo
ZW4iXSwgZW51bS5zZWxlY3QgeyB8bWR8IG1kLm1hdGNoID1+IC9edy8gfS5tYXAgeyB8bWR8IG1k
Lm1hdGNoIH0KICAgICAgYXNzZXJ0X2VxdWFsIFsiZGFtIl0sIGVudW0ucmVqZWN0IHsgfG1kfCBt
ZC5tYXRjaCA9fiAvXncvIH0ubWFwIHsgfG1kfCBtZC5tYXRjaCB9CiAgICBlbmQgICAgCiAgZW5k
CgogIGNsYXNzIFRDX0RNXzAzIDwgVGVzdDo6VW5pdDo6VGVzdENhc2UKICAgIGRlZiBzZXR1cAog
ICAgICBAZG0gPSBEaWN0aW9uYXJ5TWF0Y2hlci5uZXcKICAgICAgQGRtIDw8ICfjg4InIAogICAg
ICBAZG0gPDwgJ2ZybycgCiAgICAgIEBkbSA8PCAn44GgJwogICAgZW5kCgogICAgZGVmIHRlc3Rf
dW5pY29kZV9hd2FyZV8wMQogICAgICBhc3NlcnRfaW5zdGFuY2Vfb2YgRGljdGlvbmFyeU1hdGNo
ZXI6Ok1hdGNoRGF0YSwgCiAgICAgICAgICBtZCA9IEBkbS5tYXRjaCgn44OCIGlzJykKICAgICAg
CiAgICAgIGFzc2VydF9lcXVhbCAwLCBtZC5zdGFydF9vZmZzZXQKICAgICAgYXNzZXJ0X2VxdWFs
IDIsIG1kLmVuZF9vZmZzZXQKICAgICAgYXNzZXJ0X2VxdWFsIFswLDJdLCBtZC5vZmZzZXQKICAg
ICAgYXNzZXJ0X2VxdWFsIDMsIG1kLmxlbmd0aAogICAgICBhc3NlcnRfZXF1YWwgMSwgbWQuY2hh
cl9jb3VudAogICAgICBhc3NlcnRfZXF1YWwgIiIsIG1kLnByZV9tYXRjaAogICAgICBhc3NlcnRf
ZXF1YWwgIuODgiIsIG1kLm1hdGNoCiAgICAgIGFzc2VydF9lcXVhbCAiIGlzIiwgbWQucG9zdF9t
YXRjaAogICAgZW5kCgogICAgZGVmIHRlc3RfdW5pY29kZV9hd2FyZV8wMgogICAgICBhc3NlcnRf
aW5zdGFuY2Vfb2YgRGljdGlvbmFyeU1hdGNoZXI6Ok1hdGNoRGF0YSwKICAgICAgICBtZCA9IEBk
bS5tYXRjaCgnLCDjgaAgaXMgZnInKQogICAgICAKICAgICAgYXNzZXJ0X2VxdWFsIDIsIG1kLnN0
YXJ0X29mZnNldAogICAgICBhc3NlcnRfZXF1YWwgNCwgbWQuZW5kX29mZnNldAogICAgICBhc3Nl
cnRfZXF1YWwgWzIsNF0sIG1kLm9mZnNldAogICAgICBhc3NlcnRfZXF1YWwgMywgbWQubGVuZ3Ro
CiAgICAgIGFzc2VydF9lcXVhbCAxLCBtZC5jaGFyX2NvdW50CiAgICAgIGFzc2VydF9lcXVhbCAi
LCAiLCBtZC5wcmVfbWF0Y2gKICAgICAgYXNzZXJ0X2VxdWFsICLjgaAiLCBtZC5tYXRjaAogICAg
ICBhc3NlcnRfZXF1YWwgIiBpcyBmciIsIG1kLnBvc3RfbWF0Y2gKICAgIGVuZAogICAgCiAgICBk
ZWYgdGVzdF91bmljb2RlX2F3YXJlXzAzCiAgICAgIGEgPSBbXQogICAgICBAZG0uc2Nhbign44OC
IGlzIGZyb20gS2F0YWthbmEsIOOBoCBpcyBmcm9tIEhpcmFnYW5hJykgeyB8bWR8IGEgPDwgbWQu
bWF0Y2ggfQogICAgICBhc3NlcnRfZXF1YWwgWyfjg4InLCdmcm8nLCfjgaAnLCAnZnJvJ10sIGEK
ICAgIGVuZAogIGVuZAplbmQK


--N7cXnGaqHsTLxonszcP--