Gavin Kistner wrote: >> the cost of computing >> the hash function far outweighs the occasional cost of traversing a >> chain. > That's not something I knew, so it's hard to unforget it. Thanks :) It's probably a little more difficult and woolley now that CPUs are so very much faster than RAM, and caches are big and wide to compensate. But in the old days, compilers used techniques to hash identifiers like hashing the first few characters, the last few, and if the identifier was long, some of the middle ones as well, ignoring any other characters. This was done to avoid CPU cycles integrating the characters. It also works better with the typical character spread of identifiers rather than non-specific data. Nowadays, if you have to fill a cache line, you might as well hash all the bytes you loaded - which will often be 8 or 16 bytes. The extra CPU cycles required will disappear in wait states anyhow... If the data is ASCII, most of the entropy is in the bottom four bits. If you treat the characters in groups of four, you shouldn't just XOR all the groups together, because though you get a 32-bit result, 16 of those bits have very little entropy. I usually do a 5-bit rotate of the accumulating hash value between each word, which is often enough. Not quite as good statistical spread as a CRC, which you can match by randomising each step with a multiply and add. Here's the function I use, which works pretty well, though I won't claim it's optimal - these things always have tradeoffs. For example, this doesn't assume that cp is passed in word-aligned, though that will often be the case. HashValT HashBlock( const char* cp, size_t bytes ) { HashValT val = 0; union { char four[4]; HashValT one; }; for (bytes += 4; bytes > 4;) // Avoid unsigned woes... { one = 0; // Initialise all 4 at once switch (bytes -= 4) // Duff's device: { default: four[3] = *cp++; // Fall through case 3: four[2] = *cp++; // Fall through case 2: four[1] = *cp++; // Fall through case 1: four[0] = *cp++; } val ^= one; val = val*1103515245L + 12345; // For good measure... val = ((val<<5) ^ (val>>(32-5))) ^ one; // Rotate 5 } return val; } > I *think* that the proposed application within my company is a one- time > optimization. If you google on "perfect hashing", that will lead you to a nice tool for this job. It finds a guaranteed 1:1 hash with close to minimum computation on a predefined set of strings. Clifford Heath.