On Tue, 2 Jan 2001, Ben Tilly wrote: > gotoken / math.sci.hokudai.ac.jp (GOTO Kentaro) wrote: > > > >In message "[ruby-talk:8413] Re: speedup of anagram finder" > > on 00/12/31, David Alan Black <dblack / candle.superlink.net> writes: > > > > >So.... I think for general anagram-finding, the unpack one is still > > >the one to beat. (Of course, it is not impossible that one would want > > >to find anagrams of 5-letter words. My first Ruby program of any size > > >was a Jotto implementation. Maybe it's time to have another look at > > >that :-) > > > >Thank you for interesting measurements. Looking that, I'm feeling > >difficulty of the average cost analysis in pragmatic sense. As Guy > >said [ruby-talk:8414], we should consider about also frequency of GC. > > At some point in optimization you always reach the point where > you make trade-offs. There isn't necessarily better in general. > Merely better for my situation. My situation is that I want to do general anagram-finding :-) Assuming you're responding to my use of the word "general" -- I *think* it was reasonably clear that I was talking about finding anagrams among a mixed, non-fixed-length, list of words (such as a sizable chunk of /usr/dict/words), rather than taking some overall position against the idea that optimization involves trade-offs. (See the next sentence, for example.) But maybe I'm misreading your response. > Has anyone tried using the frequency distribution of characters in > English? Have the most common letters assigned to the smallest > primes. This should keep the size of the index down, and I think > would significantly improve performance... Interesting idea. A first go at this (with, I confess, papered-over treatment of uppercase letters, i.e., everything is getting downcased pending some work on that part) is the first threat to the Reign of Unpack. The following is based on all of /usr/dict/words, using letter frequencies derived directly from that file: user system total real [first reports removed] frequency 14.470000 0.000000 14.470000 ( 14.483800) orig_primes 24.450000 0.010000 24.460000 ( 24.458457) unpack 15.550000 0.050000 15.600000 ( 15.613195) OK, time to dust off the argument that hard-coding to ASCII is a limited approach :-) David -- David Alan Black home: dblack / candle.superlink.net work: blackdav / shu.edu Web: http://pirate.shu.edu/~blackdav