"Mauricio FernáÏdez" <batsman.geo / yahoo.com> wrote in message news:20030302083833.GA8115 / student.ei.uni-stuttgart.de... > > it seems to me that the adoption of one-at-a-time hashes does' nt What does "one at time" mean? One character at a time as opposed to Jenkins 4 characters at a time? > * Jenkins should be even worse as the additive constant to its > complexity is higher > * however, a perf. increase of 1-3% can be achieved by setting the > number of bins in the hash to be a power of 2, so that we can later > use bitwise AND instead of the modulus operator [I assume our hash is > good enough not to increase the num. of collisions a lot] There is some work going on in Jenkins hash, but it happens at the best possible location: inside the processor and at least it attempts to exploit parellism. Modern processors are much faster than memory. If the hash avoids an additional memory access ever so often, it may pay off decently. I would estimate the benefits of any good hash will only really show up in large hash tables. This is because a cache-miss is expensive (many hundred instruction cycles). An extra memory access in a small hash table is likely to happen in memory that is already conveniently in cache. If the bucket size is large you get fewer collissions, but you also increase the risc of cache-misses. This must, however, be held against the cache-misses of reading and comparing the actual keys if not stored inside the bucket itself. If the full 32-bit (or whatever) hash is compared first (i.e. store the full hash key in the bucket), the bucket count does not affect the number of external keys that must be accessed - only the quality of the hash itself. This technique also makes it cheaper to perform a rehash operation when expanding the buckettable. (I sent a copy of my hashtable in private mail). It only operates on 32 bit keys (e.g. storing pointers to already internalized strings), but it uses the above mentioned principle of storing the hash and could be enhanced to operate on longer keys without much effort. The avoidance of modulus prime is good both for avoiding the calculation, and because it makes it much easier to scale the bucket size on demand - thus avoiding large initial bucket tables - which in turn makes conflict resolution cheaper, although more frequent. For small hashtables it may not pay off with a good, but more expensive hash function. When frequently accessed, everything, including keys, will be in-cache, and the statistical spread over buckets is probably unpredictable. Mikkel