On Sun, 31 Dec 2000, ts wrote:

> >>>>> "D" == David Alan Black <dblack / candle.superlink.net> writes:
> 
> D> So the rate of same words probably varies from one file to another.
> D> But the effect of two same words should be the same as the effect of
> D> an anagram (i.e., we're expecting some of that anyway), shouldn't it?
> 
>  I'm not really sure but I think that the GC can have an importance, and
>  probably you'll don't have the same results with 10000 differents words.


I've now tried a few different ways of generating the word lists, and
it does seem that the unpack approach scales much better than the
prime number hash, as the word length gets longer.  

Summary:


   word length:    4      5      6      7      8      9     10
		 ______________________________________________

   Word lists derived from /usr/dict/words

     prime2      1.62   1.83   2.09   2.66   3.16   3.71   4.17
     unpack      2.55   2.60   2.66   2.71   2.78   2.82   2.90


   Words starting with 'a' * n and incrementing (aaaaa, aaaab, etc.)

     prime2      1.72   2.05   2.54   3.20   4.25   5.84   9.37
     unpack      2.38   2.39   2.45   2.47   2.52   2.55   2.60


   Randomly generated words from ('a'..'z', 'A'..'Z')

     prime2      1.57   1.66   2.00   2.47   2.85   3.35   3.72
     unpack      2.28   2.25   2.29   2.32   2.38   2.43   2.47


So.... I think for general anagram-finding, the unpack one is still
the one to beat.  (Of course, it is not impossible that one would want
to find anagrams of 5-letter words.  My first Ruby program of any size
was a Jotto implementation.  Maybe it's time to have another look at
that :-)


David

-- 
David Alan Black
home: dblack / candle.superlink.net
work: blackdav / shu.edu
Web:  http://pirate.shu.edu/~blackdav