Hi Ben,

In message "[ruby-talk:8468] Re: speedup of anagram finder"
    on 01/01/02, "Ben Tilly" <ben_tilly / hotmail.com> writes:
>At some point in optimization you always reach the point where
>you make trade-offs.  There isn't necessarily better in general.
>Merely better for my situation.

I agree 100%. 

>Has anyone tried using the frequency distribution of
>characters in English?  Have the most common letters
>assigned to the smallest primes.  This should keep the
>size of the index down, and I think would significantly
>improve performance...

For anyone would like to start it, I've measured only the distribution
of characters in /usr/share/dict/words.  Does it fit to well-known
statistics derived from English corpus?

-- Gotoken

 char    # occur
    e     234814
    i     200619
    a     198957
    o     170392
    r     160496
    n     158281
    t     152574
    s     139244
    l     130178
    c     103307
    u      87213
    p      78075
    m      70505
    d      68008
    h      64165
    y      51527
    g      47011
    b      40357
    f      24112
    v      20104
    k      16022
    w      13826
    z       8441
    x       6926
    q       3730
    j       3075