On Tue, 22 Feb 2005 16:59:02 +0900
Brian Schröäer <ruby.brian / gmail.com> wrote:

> Hello Jordi,
> 
> you found a bug. Your number makes my code enter into an infinite
> loop. I'll fix it and repost.
> 
> Thanks,
> 
> Brian
> 
> 
> On Tue, 22 Feb 2005 06:26:16 +0900, Jordi Bunster <jordi / bunster.org> wrote:
> > 
> > On Feb 20, 2005, at 3:50 PM, Brian Schröäer wrote:
> > 
> > > This version loads the dictionary a lot faster (3 times as fast as the
> > > old
> > > version) because it does not create as many intermediate objects. Also
> > > I measured that upcasing and gsub'ing is faster on the long string
> > > than on the individual short strings.
> > 
> > Must be something wrong with my old iBook, because here it takes
> > aaaaages for anything to even appear.
> > 
> > Makes me wonder if I'm running it correctly:
> > 
> > echo 5467945 | ruby phoneword.rb -v -p -s -d /usr/share/dict/words
> > 
> > Or is the problem just that I/O intensive?
> > 
> > --
> >         Jordi
> > 
> >
> 

Thanks to Jordi who found a testcase that broke my code I reworked the solution. Now I use a different approach to skipping numbers. I create the possible skips for a given number, ignore the skipped numbers, search a real solution for the rest and reinject the skipped numbers into the solution.

That is a lot nicer, and also the code is now cleaner. Additionally I
improved loading of the wordlist once again, made -v a true verbose
switch and added abit of description.

I hope I'm not annoying people with this long posts.

As always: The nice and colorfull solutions is at:

http://ruby.brian-schroeder.de/quiz/phoneword/

or directly at

http://ruby.brian-schroeder.de/quiz/phoneword/browse/phoneword-rb.html

Best regards,

Brian


#!/usr/bin/ruby # Nodes in the Dictionary. class DictionaryNode < Array # Terminal info attr_reader :words def initialize super(10) @words = [] end end # A tree-indexed version of the dictionary that allows efficent searching by number 2 alphabet mapping. class Dictionary def initialize(encoding) super() @encoding = {} @inverse_encoding = {} encoding.each do | k, v | @encoding[k] = v.split(/\s+/).map{|c| c[0]} end # Create map from characters to numbers @inverse_encoding = @encoding.inject({}) { | r, (k, v) | v.each do | l | r[l] = k end r } @root = DictionaryNode.new end # Helper method for rekursive adding of words to the dictionary private def add_recursive(node, word, position) if word.length == position node.words << word return node end add_recursive(node[@inverse_encoding[word[position]]] ||= DictionaryNode.new, word, position + 1) end # Add words to the dictionary public def add(word) add_recursive(@root, word, 0) self end # Load a wordlist from a file, which contains one word per line. # Ignores punctuation and whitespace. def load_wordlist(file, options) $stderr.print "Loading dictionary... " if options.verbose start = Time.new file.read.gsub(/[^A-Za-z\n]/, '').upcase!.split($/).uniq!.each do |w| w.chomp! next if w.empty? or w.length <= options.min_length self.add(w) end $stderr.puts "built dictionary in %f seconds" % (Time.new-start).to_f if options.verbose self end private # Search words and return (in the block) words and the unmatched rest of the number def sub_find(node, number, &block) # Return words found so far block[node.words.map{|w|w.dup}, number] unless node.words.empty? # No more digits, so stop searching here return node if number.empty? # Search for longer words sub_find(node[number[0]], number[1..-1], &block) if node[number[0]] end private # Calculate all allowed skip patterns for a number of a given length def skips(s, length) return [s] if length == 0 result = skips(s + [false], length-1) result.concat(skips(s + [true], length-1)) unless s[-1] result end public # Skipping makes this a bit ugly def find_noskip(number) result = [] sub_find(@root, number) do | words, rest_number | if rest_number.empty? result.concat(words) else find_noskip(rest_number).each do | sentence | words.each do | w | result << w + '-' + sentence end end end end result end # Skipping makes this a bit ugly def find(number) result = [] skips([], number.length).each do | skipped | # Create the injector that can inject the skipped numbers back into the word injector = [] skipped.zip(number).each_with_index do |(s,n), i| injector << [n.to_s, i] if s end # We search for words built from the unskipped digits unskipped_digits = number.zip(skipped).select{|(d, s)| !s}.map{|(d,s)|d} sentences = find_noskip(unskipped_digits) # Inject the skipped digits back into the found sentences sentences.each do | s | injector.each do | (n, i) | s.insert(i, n) end end result.concat(sentences) end result end end encodings = { :james => { 2 => 'A B C', 3 => 'D E F', 4 => 'G H I', 5 => 'J K L', 6 => 'M N O', 7 => 'P Q R S', 8 => 'T U V', 9 => 'W X Y Z'}, :logic => { 0 => 'A B', 1 => 'C D', 2 => 'E F', 3 => 'G H', 4 => 'I J K', 5 => 'L M N', 6 => 'O P Q', 7 => 'R S T', 8 => 'U V W', 9 => 'X Y Z' } } require 'optparse' class PhonewordOptions < OptionParser attr_reader :dictionary, :encoding, :format, :allow_skips, :help, :encoding_help, :verbose, :min_length def initialize super() @dictionary = '/usr/share/dict/words' @encoding = :james @format = :plain @allow_skips = true @help = false @encoding_help = false @verbose = false @ignore_non_alpha = false @min_length = 1 self.on("-d", "--dictionary DICTIONARY", String) { | v | @dictionary = v } self.on("-e", "--encoding ENCODING", String, "How the alphabet is encoded to phonenumbers. james or logic are supported.") { | v | @encoding = v.downcase.to_sym } self.on("-p", "--plain", 'One result per found number, no other information. (Default)') { @format = :plain } self.on("-f", "--full", 'Prefix the result with the number') { @format = :full } self.on("-v", "--verbose", 'Make more noise') { @verbose = true } self.on("-s", "--skips", "--allow_skips", "--allow-skips", 'Allow to skip one adjacent number while matching. (Default)', 'Gives lots of ugly results, but james asked for it.') { @allow_skips = true } self.on("-c", "--no-skips", "Don't leave numbers in the detected words") { @allow_skips = false } self.on("-m" "--min-length", "Minimum length of accepted words.", "Use this to ignore one-letter words that make the output quite uninteresting.", Integer) { | v | @min_length = v } self.on("-?", "--help") { @help = true } self.on("--supported-encodings", "--encoding-help", "List the supported encodings") { @encoding_help = true } end end options = PhonewordOptions.new begin options.parse!(ARGV) rescue => e puts e puts options exit end if options.help puts options exit end if options.encoding_help or !encodings[options.encoding] puts "Possible encodings:" puts encodings.to_a.sort_by{|(k,v)|k.to_s}.map{|(k,v)| "#{k}:\n"+v.map{|(n,e)|" #{n}: #{e}"}.sort.join("\n")} exit end dictionary = Dictionary.new(encodings[options.encoding]).load_wordlist(File.open(options.dictionary), options) output = { :plain => lambda do | number, sentence | sentence end, :full => lambda do | number, sentence | "#{number.ljust(15)}: #{sentence}" end } method = {true => :find, false => :find_noskip } ARGF.each do | number | number.strip! number = number.gsub(/[^0-9]/, '').unpack('C*').map{|n|n - ?0} $stderr.puts "Searching for #{number}" if options.verbose dictionary.send(method[options.allow_skips], number).each do | sentence | puts output[options.format][number, sentence] end end