On Jan 27, 2007, at 8:25 AM, Mark Woodward wrote: > On Sat, 27 Jan 2007 23:38:30 +1100, Mark Woodward wrote: > >> Hi all, >> >> I'm hoping for some critique of the code below. >> It works but is slow and probably not 'the ruby way' >> >> --------------------------------------------------------- >> # Anagrams >> # >> # Purpose: find all anagrams in /usr/share/dict/words >> # for 4 letter words (length 5 because of \n) >> >> # path to the word file >> filename = "/usr/share/dict/words" >> >> # check file exists and is readable >> if File.file?(filename) && File.readable?(filename) then >> dict_words = [] >> all_anagrams = {} >> >> open(filename).each do |dict_word| >> if dict_word.length == 5 then >> dict_words << dict_word.chomp.strip >> end >> end >> >> wrds = dict_words >> >> dict_words.each do |wrd| >> current_wrd = wrd.split(//).sort.join >> >> anagrams = [] >> >> wrds.each do |w| >> if wrd != w then >> if not current_wrd == w.split(//).sort.join then >> next >> end >> anagrams.push(w) >> end >> end >> >> if not anagrams.empty? then >> all_anagrams[wrd] = anagrams >> end >> end >> >> p all_anagrams >> >> end >> --------------------------------------------------------- >> >> thanks, > > I just found a solution here: > http://www.joeygibson.com/blog/tech/ruby/index.html > > that is exponentially faster than mine and finds all anagrams not > just 4 > letter words! I just need to understand it now. > > --------------------------------------------------------- > words = IO.readlines("/usr/share/dict/words") Slurps in the entire dictionary as an array of lines (typically much quicker than reading a line at a time even with buffering). > > anagrams = Hash.new([]) Build a new Hash where the default value is an empty array > > words.each do |word| > base = Array.new > word.chomp!.downcase! > > word.each_byte do |byte| > base << byte.chr > end Get the individual bytes in the word (after having stripped any line ending [chomp!] and standardizing the lettercase [downcase!])... > > base.sort! ...and put them in a standard order (by value, the bytes are just integers). This is essentially the same as you were doing with current_wrd (in fact, it might be interesting to see how much this change affects the overall speed). > > anagrams[base.to_s] |= [word] construct the string representation of the array of byte values to use as the hash key, and then use the array union operator to include the new word. > end > > # Find the anagrams by eliminating those with only one word > anagrams.reject! {|k, v| v.length == 1} > > values = anagrams.values.sort do |a, b| > b.length <=> a.length > end Largest sets first. > > File.open("anagrams.txt", "w") do |file| > values.each do |line| > file.puts(line.join(", ")) > end > end > > largest = anagrams.keys.max do |a, b| > a.length <=> b.length > end > > puts "Total: #{anagrams.length}" # > puts "Largest set of anagrams: #{values[0].inspect}" # > print "Longest anagram: #{anagrams[largest].inspect} " # > puts "at #{largest.length} characters each" > --------------------------------------------------------- > > -- > Mark Does that make more sense to you? (I nearly ignored your message, but your second post showed that you were really interested in understanding, not just trying to get others to do your code review ;-) -Rob Rob Biedenharn http://agileconsultingllc.com Rob / AgileConsultingLLC.com