On Sun, 31 Dec 2000, ts wrote: > >>>>> "D" == David Alan Black <dblack / candle.superlink.net> writes: > > D> So the rate of same words probably varies from one file to another. > D> But the effect of two same words should be the same as the effect of > D> an anagram (i.e., we're expecting some of that anyway), shouldn't it? > > I'm not really sure but I think that the GC can have an importance, and > probably you'll don't have the same results with 10000 differents words. I've now tried a few different ways of generating the word lists, and it does seem that the unpack approach scales much better than the prime number hash, as the word length gets longer. Summary: word length: 4 5 6 7 8 9 10 ______________________________________________ Word lists derived from /usr/dict/words prime2 1.62 1.83 2.09 2.66 3.16 3.71 4.17 unpack 2.55 2.60 2.66 2.71 2.78 2.82 2.90 Words starting with 'a' * n and incrementing (aaaaa, aaaab, etc.) prime2 1.72 2.05 2.54 3.20 4.25 5.84 9.37 unpack 2.38 2.39 2.45 2.47 2.52 2.55 2.60 Randomly generated words from ('a'..'z', 'A'..'Z') prime2 1.57 1.66 2.00 2.47 2.85 3.35 3.72 unpack 2.28 2.25 2.29 2.32 2.38 2.43 2.47 So.... I think for general anagram-finding, the unpack one is still the one to beat. (Of course, it is not impossible that one would want to find anagrams of 5-letter words. My first Ruby program of any size was a Jotto implementation. Maybe it's time to have another look at that :-) David -- David Alan Black home: dblack / candle.superlink.net work: blackdav / shu.edu Web: http://pirate.shu.edu/~blackdav