"Eugene Kalenkovich" <rubify / softover.com> wrote in message news:AK_Ji.6292$TH2.5856 / trndny06... > Here is my solution. Obviously it is not as "well educated" as Steve's one > because it does not use pronunciation dictionary, but it behaves pretty > well with what it has :) One more limit of a good education - sometimes it is limited to an education source. Unfortunately I did not have enough patience to test it on all new expectations, even single example requires too much time (without ruby inline), but I know that it will not get at least some part of a new set: # expectations.rb $expectations = { %w[pro 1 sal] => 'provencal', %w[2 thing] => 'toothing', %w[r b trash] => 'arbitrage', %w[mon 3 l] => 'montreal', %w[3 men dose] => 'tremendous', %w[mid wind r] => 'midwinter', %w[yes tier knight] => 'yesternight', %w[mar well s]=>'marvelous', %w[vert x]=>'vertex', %w[ban l eyes] => 'banalize', %w[harm n eyes] => 'harmonize', %w[harm o niece east] => 'harmonicist', %w[knight in gale] => 'nightingale', %w[knee hill east] => 'nihilist', %w[mass car pone a] => 'mascarpone', %w[cock knee] => 'cockney', } ################################################ And two more variations for my solution. First - compromise with coarsened hash, balancing decent matching with decent performance. This one is probably the best I coud get. Second one - full scan. Allows more concise code and slightly better matching in cost of performance (but still several time faster than Steve's without inline). # Variant 1 require 'rubygems' require 'text' include Text::Metaphone include Text::Levenshtein load 'expectations.rb' subs={'1'=>'wan','2'=>'to','3'=>'tre','4'=>'for','5'=>'five','6'=>'six','7'=>'seven','8'=>'ate','9'=>'nine','10'=>'ten', 'c'=>'see','h'=>'eich','j'=>'jey','k'=>'key','q'=>'que','r'=>'ar'} subsy={} %w[b c d g p t v z].each {|l| subsy[l]=l+'y'} %w[b c d g p t v z].each {|l| subs[l]=l+'ee'} %w[f l m n s x].each{|l| subs[l]='e'+l} def metadist(str1,str2) 2*distance(metaphone(str1),metaphone(str2))+ distance(str1,str2) end def short_double_metaphone(word) m1,m2=double_metaphone(word) [m1[0,2],m2 ? m2[0,2] : nil] end hash=Hash.new{|h,k|h[k]=[]} File.open("/usr/share/dict/words") {|f| f.readlines}.each do |w| word=w.downcase.delete("^a-z") m1,m2=short_double_metaphone(word) hash[m1]<<word hash[m2]<<word if m2 end $expectations.values.each { |word| m1,m2=short_double_metaphone(word) hash[m1]<<word hash[m2]<<word if m2 } hash.each_key{|k| hash[k].uniq!} inputs=[] if (ARGV.empty?) inputs=$expectations.keys else inputs << ARGV end inputs.each { |rebus| y_ed=rebus[0..-2]<<(subsy[rebus[-1]] || rebus[-1]) word=y_ed.map{|w| subs[w] || w }.join.downcase.gsub(/[^a-z0-9]/,'') m1,m2=short_double_metaphone(word) results=hash[m1] results+=hash[m2] if m2 && m2!=m1 res=results.uniq.sort_by{|a| [metadist(word,a),a.length]}.first(5) print "'#{rebus.join(' ')}' => #{res[0]}" expected=$expectations[rebus] print ", expected '#{expected}' is at position #{res.index(expected)}" if expected puts } ################################################ # Variant 2 require 'rubygems' require 'text' include Text::Metaphone include Text::Levenshtein load 'expectations.rb' subs={'1'=>'won','2'=>'to','3'=>'tre','4'=>'for','5'=>'five','6'=>'six','7'=>'seven','8'=>'ate','9'=>'nine','10'=>'ten', 'c'=>'see','h'=>'eich','j'=>'jey','k'=>'key','q'=>'que','r'=>'ar'} subsy={} %w[b c d g p t v z].each {|l| subsy[l]=l+'y'} %w[b c d g p t v z].each {|l| subs[l]=l+'ee'} %w[f l m n s x].each{|l| subs[l]='e'+l} def metadist(str1,str2) 2*distance(metaphone(str1),metaphone(str2))+ distance(str1,str2) end words = (File.open("/usr/share/dict/words") {|f| f.readlines}.map{|word| word.downcase.delete("^a-z")}+$expectations.values).uniq inputs=[] if (ARGV.empty?) inputs=$expectations.keys else inputs << ARGV end inputs.each { |rebus| y_ed=rebus[0..-2]<<(subsy[rebus[-1]] || rebus[-1]) word=y_ed.map{|w| subs[w] || w }.join.downcase.gsub(/[^a-z0-9]/,'') res=words.sort_by{ |a| [metadist(word,a),a.length] }.first(5) print "'#{rebus.join(' ')}' => #{res[0]}" expected=$expectations[rebus] print ", expected '#{expected}' is at position #{res.index(expected)}" if expected puts } ################################################