On Tue, 22 Feb 2005 16:59:02 +0900 Brian Schr?der <ruby.brian / gmail.com> wrote: > Hello Jordi, > > you found a bug. Your number makes my code enter into an infinite > loop. I'll fix it and repost. > > Thanks, > > Brian > > > On Tue, 22 Feb 2005 06:26:16 +0900, Jordi Bunster <jordi / bunster.org> wrote: > > > > On Feb 20, 2005, at 3:50 PM, Brian Schr?der wrote: > > > > > This version loads the dictionary a lot faster (3 times as fast as the > > > old > > > version) because it does not create as many intermediate objects. Also > > > I measured that upcasing and gsub'ing is faster on the long string > > > than on the individual short strings. > > > > Must be something wrong with my old iBook, because here it takes > > aaaaages for anything to even appear. > > > > Makes me wonder if I'm running it correctly: > > > > echo 5467945 | ruby phoneword.rb -v -p -s -d /usr/share/dict/words > > > > Or is the problem just that I/O intensive? > > > > -- > > Jordi > > > > > Thanks to Jordi who found a testcase that broke my code I reworked the solution. Now I use a different approach to skipping numbers. I create the possible skips for a given number, ignore the skipped numbers, search a real solution for the rest and reinject the skipped numbers into the solution. That is a lot nicer, and also the code is now cleaner. Additionally I improved loading of the wordlist once again, made -v a true verbose switch and added abit of description. I hope I'm not annoying people with this long posts. As always: The nice and colorfull solutions is at: http://ruby.brian-schroeder.de/quiz/phoneword/ or directly at http://ruby.brian-schroeder.de/quiz/phoneword/browse/phoneword-rb.html Best regards, Brian
#!/usr/bin/ruby # Nodes in the Dictionary. class DictionaryNode < Array # Terminal info attr_reader :words def initialize super(10) @words = [] end end # A tree-indexed version of the dictionary that allows efficent searching by number 2 alphabet mapping. class Dictionary def initialize(encoding) super() @encoding = {} @inverse_encoding = {} encoding.each do | k, v | @encoding[k] = v.split(/\s+/).map{|c| c[0]} end # Create map from characters to numbers @inverse_encoding = @encoding.inject({}) { | r, (k, v) | v.each do | l | r[l] = k end r } @root = DictionaryNode.new end # Helper method for rekursive adding of words to the dictionary private def add_recursive(node, word, position) if word.length == position node.words << word return node end add_recursive(node[@inverse_encoding[word[position]]] ||= DictionaryNode.new, word, position + 1) end # Add words to the dictionary public def add(word) add_recursive(@root, word, 0) self end # Load a wordlist from a file, which contains one word per line. # Ignores punctuation and whitespace. def load_wordlist(file, options) $stderr.print "Loading dictionary... " if options.verbose start = Time.new file.read.gsub(/[^A-Za-z\n]/, '').upcase!.split($/).uniq!.each do |w| w.chomp! next if w.empty? or w.length <= options.min_length self.add(w) end $stderr.puts "built dictionary in %f seconds" % (Time.new-start).to_f if options.verbose self end private # Search words and return (in the block) words and the unmatched rest of the number def sub_find(node, number, &block) # Return words found so far block[node.words.map{|w|w.dup}, number] unless node.words.empty? # No more digits, so stop searching here return node if number.empty? # Search for longer words sub_find(node[number[0]], number[1..-1], &block) if node[number[0]] end private # Calculate all allowed skip patterns for a number of a given length def skips(s, length) return [s] if length == 0 result = skips(s + [false], length-1) result.concat(skips(s + [true], length-1)) unless s[-1] result end public # Skipping makes this a bit ugly def find_noskip(number) result = [] sub_find(@root, number) do | words, rest_number | if rest_number.empty? result.concat(words) else find_noskip(rest_number).each do | sentence | words.each do | w | result << w + '-' + sentence end end end end result end # Skipping makes this a bit ugly def find(number) result = [] skips([], number.length).each do | skipped | # Create the injector that can inject the skipped numbers back into the word injector = [] skipped.zip(number).each_with_index do |(s,n), i| injector << [n.to_s, i] if s end # We search for words built from the unskipped digits unskipped_digits = number.zip(skipped).select{|(d, s)| !s}.map{|(d,s)|d} sentences = find_noskip(unskipped_digits) # Inject the skipped digits back into the found sentences sentences.each do | s | injector.each do | (n, i) | s.insert(i, n) end end result.concat(sentences) end result end end encodings = { :james => { 2 => 'A B C', 3 => 'D E F', 4 => 'G H I', 5 => 'J K L', 6 => 'M N O', 7 => 'P Q R S', 8 => 'T U V', 9 => 'W X Y Z'}, :logic => { 0 => 'A B', 1 => 'C D', 2 => 'E F', 3 => 'G H', 4 => 'I J K', 5 => 'L M N', 6 => 'O P Q', 7 => 'R S T', 8 => 'U V W', 9 => 'X Y Z' } } require 'optparse' class PhonewordOptions < OptionParser attr_reader :dictionary, :encoding, :format, :allow_skips, :help, :encoding_help, :verbose, :min_length def initialize super() @dictionary = '/usr/share/dict/words' @encoding = :james @format = :plain @allow_skips = true @help = false @encoding_help = false @verbose = false @ignore_non_alpha = false @min_length = 1 self.on("-d", "--dictionary DICTIONARY", String) { | v | @dictionary = v } self.on("-e", "--encoding ENCODING", String, "How the alphabet is encoded to phonenumbers. james or logic are supported.") { | v | @encoding = v.downcase.to_sym } self.on("-p", "--plain", 'One result per found number, no other information. (Default)') { @format = :plain } self.on("-f", "--full", 'Prefix the result with the number') { @format = :full } self.on("-v", "--verbose", 'Make more noise') { @verbose = true } self.on("-s", "--skips", "--allow_skips", "--allow-skips", 'Allow to skip one adjacent number while matching. (Default)', 'Gives lots of ugly results, but james asked for it.') { @allow_skips = true } self.on("-c", "--no-skips", "Don't leave numbers in the detected words") { @allow_skips = false } self.on("-m" "--min-length", "Minimum length of accepted words.", "Use this to ignore one-letter words that make the output quite uninteresting.", Integer) { | v | @min_length = v } self.on("-?", "--help") { @help = true } self.on("--supported-encodings", "--encoding-help", "List the supported encodings") { @encoding_help = true } end end options = PhonewordOptions.new begin options.parse!(ARGV) rescue => e puts e puts options exit end if options.help puts options exit end if options.encoding_help or !encodings[options.encoding] puts "Possible encodings:" puts encodings.to_a.sort_by{|(k,v)|k.to_s}.map{|(k,v)| "#{k}:\n"+v.map{|(n,e)|" #{n}: #{e}"}.sort.join("\n")} exit end dictionary = Dictionary.new(encodings[options.encoding]).load_wordlist(File.open(options.dictionary), options) output = { :plain => lambda do | number, sentence | sentence end, :full => lambda do | number, sentence | "#{number.ljust(15)}: #{sentence}" end } method = {true => :find, false => :find_noskip } ARGF.each do | number | number.strip! number = number.gsub(/[^0-9]/, '').unpack('C*').map{|n|n - ?0} $stderr.puts "Searching for #{number}" if options.verbose dictionary.send(method[options.allow_skips], number).each do | sentence | puts output[options.format][number, sentence] end end