On 7/31/06, Francis Cianfrocca <garbagecat10 / gmail.com> wrote: > Timothy Goddard wrote: > > Notice that using MD5 is significantly slower than normal string > > comparison. This also demonstrates that there are few performance gains > > between 10KB buffers and 100KB buffers, indicating that somewhere in > > the 10K range would be a good buffer size for the memory/performance > > tradeoff. > > > > I notice that MD5-generation is not twice as time-consuming as string > comparison. In fact, it's only a little more time-consuming, which was > an interesting surprise until I checked the source code and realized > that Ruby uses the C reference implementation to compute MD5. > > Comparing strings is obviously the better choice for doing one-off > comparisons that won't be repeated. But for applications like > cache-management or public email systems, where you're going to be > comparing many times against the same chunk of bits, it makes more sense > to store an MD5. That way, subsequent trials only have to compute one > hash, not two. > > Someone upthread suggested using SHA1 instead of MD5 for this purpose. I > haven't done the comparison in Ruby, but in C implementations, SHA1 is > just slightly slower than MD5, not enough to matter. And Ruby's SHA1 > implementation is also in C. > > -- > Posted via http://www.ruby-forum.com/. > > The choice of CRC32/md5/sha1 is a time/space vs false positive probability trade-off. For normal uses, CRC (32bits) + size should be enough. It has a nice feature that it fits into a doubleword. The advantage of md5 and sha1 is that they are one-way functions, and that collisions are hard to find. So, if you need that 30% speed gain or that 12 bytes per hash, and you don't need attack-resistance, and probability 2^-32 is low enough, then use crc32. if you do need attack-resistance, I would choose sha1.