On Jan 27, 2007, at 8:25 AM, Mark Woodward wrote:
> On Sat, 27 Jan 2007 23:38:30 +1100, Mark Woodward wrote:
>
>> Hi all,
>>
>> I'm hoping for some critique of the code below.
>> It works but is slow and probably not 'the ruby way'
>>
>> ---------------------------------------------------------
>> # Anagrams
>> #
>> # Purpose: find all anagrams in /usr/share/dict/words
>> #          for 4 letter words (length 5 because of \n)
>>
>> # path to the word file
>> filename = "/usr/share/dict/words"
>>
>> # check file exists and is readable
>> if File.file?(filename) && File.readable?(filename) then
>>   dict_words = []
>>   all_anagrams = {}
>>
>>   open(filename).each do |dict_word|
>>     if dict_word.length == 5 then
>>       dict_words << dict_word.chomp.strip
>>     end
>>   end
>>
>>   wrds = dict_words
>>
>>   dict_words.each do |wrd|
>>     current_wrd = wrd.split(//).sort.join
>>
>>     anagrams = []
>>
>>     wrds.each do |w|
>>       if wrd != w then
>>         if not current_wrd == w.split(//).sort.join then
>>           next
>>         end
>>         anagrams.push(w)
>>       end
>>     end
>>
>>     if not anagrams.empty? then
>>       all_anagrams[wrd] = anagrams
>>     end
>>   end
>>
>>   p all_anagrams
>>
>> end
>> ---------------------------------------------------------
>>
>> thanks,
>
> I just found a solution here:
> http://www.joeygibson.com/blog/tech/ruby/index.html
>
> that is exponentially faster than mine and finds all anagrams not  
> just 4
> letter words! I just need to understand it now.
>
> ---------------------------------------------------------
> words = IO.readlines("/usr/share/dict/words")
Slurps in the entire dictionary as an array of lines (typically much  
quicker than reading a line at a time even with buffering).
>
> anagrams = Hash.new([])
Build a new Hash where the default value is an empty array
>
> words.each do |word|
>     base = Array.new
>     word.chomp!.downcase!
>
>     word.each_byte do |byte|
>         base << byte.chr
>     end
Get the individual bytes in the word (after having stripped any line  
ending [chomp!] and standardizing the lettercase [downcase!])...
>
>     base.sort!
...and put them in a standard order (by value, the bytes are just  
integers).  This is essentially the same as you were doing with  
current_wrd (in fact, it might be interesting to see how much this  
change affects the overall speed).
>
>     anagrams[base.to_s] |= [word]
construct the string representation of the array of byte values to  
use as the hash key, and then use the array union operator to include  
the new word.

> end
>
> # Find the anagrams by eliminating those with only one word
> anagrams.reject! {|k, v| v.length == 1}
>
> values = anagrams.values.sort do |a, b|
>     b.length <=> a.length
> end
Largest sets first.
>
> File.open("anagrams.txt", "w") do |file|
>     values.each do |line|
>         file.puts(line.join(", "))
>     end
> end
>
> largest = anagrams.keys.max do |a, b|
>     a.length <=> b.length
> end
>
> puts "Total: #{anagrams.length}" #
> puts "Largest set of anagrams: #{values[0].inspect}" #
> print "Longest anagram: #{anagrams[largest].inspect} " #
> puts "at #{largest.length} characters each"
> ---------------------------------------------------------
>
> -- 
> Mark

Does that make more sense to you?  (I nearly ignored your message,  
but your second post showed that you were really interested in  
understanding, not just trying to get others to do your code review ;-)

-Rob

Rob Biedenharn		http://agileconsultingllc.com
Rob / AgileConsultingLLC.com