On Sun, 2 Mar 2003, Mauricio Fern?ndez wrote:

> What about the following? I get a speed increase of 3% (compared to
> the non-inlined version)  with the second test of the GCLS. I
> eliminate 2 instructions per iteration (on x86) out of 10 in the loop.
> It's OK to use compiler extensions if we detect them in configure,
> isn't it?

> +#elif HAVE_COMPUTED_GOTO
> +    static void* unrolled_addr[] = {
> +        &&PC_0,
> +        &&PC_1,
[...]
> +    PC_32: key = key*65599 + *p++;
> +    PC_31: key = key*65599 + *p++;

All this manual unrolling may have some effect, but wouldn't it be much
more worth it to make the computations parallelizable ? in this case they
are not because key is reassigned to every time.

e.g.:

key = key*65599 + *p++;
key = key*65599 + *p++;

could become:

key = key*8261505 + p[0]*65599 + p[1]; p+=2;

(note that computed-goto is not compatible with that last trick)

Or else, a new hashfunction could be defined to work on shorts or longs
instead of char.

It all depends on how the following four balance:

 1. penalty of computed-goto
 2. penalty of ordinary unrolled loop
 3. improvement caused by merging statements
 4. improvement caused by changing the formula to work on longs

I haven't tried them though. I'm just trying to contribute a few ideas.

________________________________________________________________
Mathieu Bouchard                       http://artengine.ca/matju