In article <733CA39D4982D411AD080090278CBE399A5217 / shalom.israel.hp.com>,
SHULTZ,BARRY (HP-Israel,ex1) <barry_shultz / hp.com> wrote:
>> go, either. The fact is, I've spent a lot of time the past couple days
>> thinking about
>> a new solution, related to the "primes" solution. If I come up with
>> something 
>
>Here's a faster, but obviously problematic, solution, kind of a first
>approximation for a solution:
>
>Again, if you don't like math, I apologize for any jargon in the following
>text.
>
>I call it the "slide rule" approach. "prime" was slowing up because of the
>way Ruby was handling
>multiplication of large numbers. So I thought about a logarithmic solution
>using

Since I do mathematics, I feel compelled  to add my 2 cents (although it
is *very* elementary  mathematics here). I am afraid  your solution with
floating-point numbers will be lossy  (inaccurate) however hard you try.
A perfect additive index function is  to represent, e.g. the i-th letter
by 2**i and add the values for  each letter. Of course this way, you can
represent (in  26 bits)  only the  words where  no letter  occurs twice.
Using 4**i you may  represent words where a given letter  occurs up to 3
times, in 52 bits. This will be  a slightly slower because you will have
to use two fixnums and test for  the letter being smaller or bigger than
'm' to see in which one it falls. If you use base 256 you may encode any
word where a letter occurs less than 255 times in a string of length 26.
The problem is how  to build this string fast. Maybe  the directive @ of
pack would help, but I expect this  last solution to be slower than sort
probably.

  Here again a good solution would  be to take in account differences in
the  various  letters  behaviour  and a  'mixed  basis'  representation:
assume that  a,e,i can  occur 4  times in  a word  but any  other letter
only  3.  Then  you  may  encode  words by  a  polynomial  of  the  form
a+e*5+i*5**2+b*4*5**3+etc..  (where  a,e,i,b  etc..  is  the  number  of
occurences of  a,e,i,b etc..  in the  word). You  would compute  this by
pre-computing the 'powers' like 4*5**3 in  the mixed basis in some array
'pow', say, then do something like:

    sum=0
    word.each_byte{|b| sum+=pow[b]}

  Anyway I cannot test any of the  solutions I propose for speed since I
forgot to  save the proposals of  other people, so I  let you experiment
with my ideas if you wish.

  Best regards,
     Jean MICHEL