"Eugene Kalenkovich" <rubify / softover.com> wrote in message 
news:AK_Ji.6292$TH2.5856 / trndny06...
> Here is my solution. Obviously it is not as "well educated" as Steve's one 
> because it does not use pronunciation dictionary, but it behaves pretty 
> well with what it has :)

One more limit of a good education - sometimes it is limited to an education 
source. Unfortunately I did not have enough patience to test it on all new 
expectations, even single example requires too much time (without ruby 
inline), but I know that it will not get at least some part of a new set:

# expectations.rb
$expectations = {
  %w[pro 1 sal] => 'provencal',
  %w[2 thing] => 'toothing',
  %w[r b trash] => 'arbitrage',
  %w[mon 3 l] => 'montreal',
  %w[3 men dose] => 'tremendous',
  %w[mid wind r] => 'midwinter',
  %w[yes tier knight] => 'yesternight',
  %w[mar well s]=>'marvelous',
  %w[vert x]=>'vertex',
  %w[ban l eyes] => 'banalize',
  %w[harm n eyes] => 'harmonize',
  %w[harm o niece east] => 'harmonicist',
  %w[knight in gale] => 'nightingale',
  %w[knee hill east] => 'nihilist',
  %w[mass car pone a] => 'mascarpone',
  %w[cock knee] => 'cockney',
}
################################################

And two more variations for my solution.
First - compromise with coarsened hash, balancing decent matching with 
decent performance. This one is probably the best I coud get.

Second one - full scan. Allows more concise code and slightly better 
matching in cost of performance (but still several time faster than Steve's 
without inline).

# Variant 1
require 'rubygems'
require 'text'
include Text::Metaphone
include Text::Levenshtein
load 'expectations.rb'

subs={'1'=>'wan','2'=>'to','3'=>'tre','4'=>'for','5'=>'five','6'=>'six','7'=>'seven','8'=>'ate','9'=>'nine','10'=>'ten',
      'c'=>'see','h'=>'eich','j'=>'jey','k'=>'key','q'=>'que','r'=>'ar'}
subsy={}
%w[b c d g p t v z].each {|l| subsy[l]=l+'y'}
%w[b c d g p t v z].each {|l| subs[l]=l+'ee'}
%w[f l m n s x].each{|l| subs[l]='e'+l}

def metadist(str1,str2)
  2*distance(metaphone(str1),metaphone(str2))+
  distance(str1,str2)
end

def short_double_metaphone(word)
  m1,m2=double_metaphone(word)
  [m1[0,2],m2 ? m2[0,2] : nil]
end

hash=Hash.new{|h,k|h[k]=[]}

File.open("/usr/share/dict/words") {|f| f.readlines}.each do |w|
  word=w.downcase.delete("^a-z")
  m1,m2=short_double_metaphone(word)
  hash[m1]<<word
  hash[m2]<<word if m2
end
$expectations.values.each { |word|
  m1,m2=short_double_metaphone(word)
  hash[m1]<<word
  hash[m2]<<word if m2
}

hash.each_key{|k| hash[k].uniq!}

inputs=[]
if (ARGV.empty?)
  inputs=$expectations.keys
else
  inputs << ARGV
end

inputs.each { |rebus|
  y_ed=rebus[0..-2]<<(subsy[rebus[-1]] || rebus[-1])
  word=y_ed.map{|w| subs[w] || w }.join.downcase.gsub(/[^a-z0-9]/,'')
  m1,m2=short_double_metaphone(word)
  results=hash[m1]
  results+=hash[m2] if m2 && m2!=m1
  res=results.uniq.sort_by{|a| [metadist(word,a),a.length]}.first(5)
  print "'#{rebus.join(' ')}' => #{res[0]}"
  expected=$expectations[rebus]
  print ", expected '#{expected}' is at position #{res.index(expected)}" if 
expected
  puts
}
################################################


# Variant 2
require 'rubygems'
require 'text'
include Text::Metaphone
include Text::Levenshtein
load 'expectations.rb'

subs={'1'=>'won','2'=>'to','3'=>'tre','4'=>'for','5'=>'five','6'=>'six','7'=>'seven','8'=>'ate','9'=>'nine','10'=>'ten',
      'c'=>'see','h'=>'eich','j'=>'jey','k'=>'key','q'=>'que','r'=>'ar'}
subsy={}
%w[b c d g p t v z].each {|l| subsy[l]=l+'y'}
%w[b c d g p t v z].each {|l| subs[l]=l+'ee'}
%w[f l m n s x].each{|l| subs[l]='e'+l}

def metadist(str1,str2)
  2*distance(metaphone(str1),metaphone(str2))+
  distance(str1,str2)
end

words = (File.open("/usr/share/dict/words") {|f| f.readlines}.map{|word| 
word.downcase.delete("^a-z")}+$expectations.values).uniq

inputs=[]
if (ARGV.empty?)
  inputs=$expectations.keys
else
  inputs << ARGV
end

inputs.each { |rebus|
  y_ed=rebus[0..-2]<<(subsy[rebus[-1]] || rebus[-1])
  word=y_ed.map{|w| subs[w] || w }.join.downcase.gsub(/[^a-z0-9]/,'')
  res=words.sort_by{ |a| [metadist(word,a),a.length] }.first(5)
  print "'#{rebus.join(' ')}' => #{res[0]}"
  expected=$expectations[rebus]
  print ", expected '#{expected}' is at position #{res.index(expected)}" if 
expected
  puts
}

################################################