On Sun, 2 Mar 2003, Mauricio FernáÏdez wrote: > What about the following? I get a speed increase of 3% (compared to > the non-inlined version) with the second test of the GCLS. I > eliminate 2 instructions per iteration (on x86) out of 10 in the loop. > It's OK to use compiler extensions if we detect them in configure, > isn't it? > +#elif HAVE_COMPUTED_GOTO > + static void* unrolled_addr[] = { > + &&PC_0, > + &&PC_1, [...] > + PC_32: key = key*65599 + *p++; > + PC_31: key = key*65599 + *p++; All this manual unrolling may have some effect, but wouldn't it be much more worth it to make the computations parallelizable ? in this case they are not because key is reassigned to every time. e.g.: key = key*65599 + *p++; key = key*65599 + *p++; could become: key = key*8261505 + p[0]*65599 + p[1]; p+=2; (note that computed-goto is not compatible with that last trick) Or else, a new hashfunction could be defined to work on shorts or longs instead of char. It all depends on how the following four balance: 1. penalty of computed-goto 2. penalty of ordinary unrolled loop 3. improvement caused by merging statements 4. improvement caused by changing the formula to work on longs I haven't tried them though. I'm just trying to contribute a few ideas. ________________________________________________________________ Mathieu Bouchard http://artengine.ca/matju