"Daniel Sheppard" <daniels / pronto.com.au> writes:

> I was doing some playing with some implementations for Hash#== and ended
> up with some very counterintuitive timings.
>
> I have four equals methods, and I would have expected their performance
> to run (from best to worst):
>
> directInject - stops evaluating == after first bad match, no
> intermediate arrays 
> keysInject - as above, but must create keys array
> directCollect - evaluated == for all items all the time, intermediate
> array of results
> keysCollect - as above, but must create keys array.

Your *Inject methods don't actually quit early; they just stop
evaluating the second expression once val is false.  An explicit break
speeds up the quick-exit case considerably.

> def directInject(hash1, hash2)
>   hash1.size == hash2.size && 
>     hash1.inject(true) { |val, kv| val && hash2[kv[0]] == kv[1] }
      hash1.inject(true) { |val, kv| val or break; hash2[kv[0]] == kv[1] }
> end
> def keysInject(hash1, hash2)
>   hash1.size == hash2.size && 
>     hash1.keys.inject(true) { |val, k| val && hash1[k] == hash2[k] }
      hash1.keys.inject(true) { |val, k| val or break; hash1[k] == hash2[k] }
> end

Unless you're using a really old ruby, though, Enumerable#all? should
outperform all of the above.

---- revised code ----

def keysCollect(hash1, hash2)
  hash1.size == hash2.size && 
    !(hash1.keys.collect{|k| !hash2[k].nil? && hash1[k] == hash2[k]
}.include?(false))
end
def directCollect(hash1, hash2)
  hash1.size == hash2.size && 
    !(hash1.collect{|k,v| hash2[k] == v }.include?(false))
end
def directInject(hash1, hash2)
  hash1.size == hash2.size && 
    hash1.inject(true) { |val, kv| val or break; hash2[kv[0]] == kv[1] }
end
def keysInject(hash1, hash2)
  hash1.size == hash2.size && 
    hash1.keys.inject(true) { |val, k| val or break; hash1[k] == hash2[k] }
end
def using_all hash1, hash2
  hash1.size == hash2.size &&
    hash1.all?{|k, v| hash2[k] == v}
end

def time(p, *args)
  time = Time.now
  1000.times { p.call *args}
  Time.now - time
end

hash1 = Hash.new
hash2 = Hash.new
xx = 'a'
2000.times { hash1[xx] = xx; xx = xx.next }
2000.times { hash2[xx] = xx; xx = xx.next }

[:keysCollect, :directCollect, :directInject, :keysInject,
 :using_all].each { |s|
  puts
  puts s
  puts "equal     #{time(method(s), hash1, hash1)}"
  puts "not equal #{time(method(s), hash1, hash2)}"
}

---- output ----

keysCollect
equal     3.071011
not equal 1.992086

directCollect
equal     3.211872
not equal 3.447227

directInject
equal     4.202059
not equal 0.004962

keysInject
equal     3.343774
not equal 0.147189

using_all
equal     2.84662
not equal 0.003133

----

Incidentally, if you're using this to compare
equality-of-Hash-sans-default-proc, none of these are adequate in the
general case since keys may exist that map to the default value.

Perhaps:

def using_all hash1, hash2
  hash1.size == hash2.size &&
    hash1.all?{|k, v| hash2.has_key?(k) && hash2[k] == v}
end

Or, as mentioned elsewhere, the shorter:

{}.update(h1) == {}.update(h2)

.... which seems to have better worst-case, but worse best-case
performance.

HTH