On Fri, Dec 30, 2011 at 3:43 PM, Nikolai Weibull <now / bitwi.se> wrote:
> On Fri, Dec 30, 2011 at 15:02, Robert Klemme <shortcutter / googlemail.com>=
 wrote:
>> On Fri, Dec 30, 2011 at 12:47 AM, Nikolai Weibull <now / bitwi.se> wrote:
>
>>> I seem to have failed in communicating what I=92m after. =A0I wasn=92t =
after
>>> different ways of implementing #hash, I was after the golden standard
>>> of #hash implementations for value objects. =A0So, again, what=92s the
>>> standard way of implementing #hash for value objects in Ruby?
>
>> I have covered this a bit earlier:
>> http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html
>
>> Boils down to that the hash code should be derived from hash codes of
>> all fields used as key fields in determining equivalence.
>
> Precisely. =A0And how do you get a single value from the hash values of
> those fields?
>
>>> One suggestion would be the XOR of the hash values of the object=92s
>>> class and its instance variables.
>
>> Not good, because then the same value in different fields has
>> identical influence on the hash code. =A0Consequently likelihood of
>> collisions increases and thus performance of Hash storage and
>> retrieval may decrease.
>
> This is exactly what you suggest in your article, with the addition
> (or rather XOR) of the hash value of the object=92s class.

Right.  I should probably have given more detail in this thread with
regard to what goals this is "worse" than the simple XOR of member
hash values.

>> But still, I don't see the need. =A0Note also that a proper Hash key
>> usually should be immutable because changing them causes all sorts of
>> trouble if not done carefully.
>
> Hence the use of =93value object=94 in my question.

I can't see that "value object" implies "immutable".

http://c2.com/cgi/wiki?ValueObject
http://en.wikipedia.org/wiki/Value_object

>> And most of the time Hash keys are
>> String, Symbol, Fixnum and the like - which all do have their #hash
>> implementation already.
>
> But how do you combine them? =A0That=92s what this whole thread has been
> about. =A0I realize now that I made quite a few assumptions about what I
> thought readers would understand from my original question that may
> not have been as obvious to them as they were to me.

There you have learned something about communication. :-)

> First, let me apologize for this lack of clarity.

No sweat.

> Second, let me rephrase my
> question and add some additional context and examples:
>
> What algorithm should one employ in the calculation of the hash value
> of an arbitrary value object?

There is no single standard (or best) way.  The fact that different
languages (Java, Ruby...) have different means to calculate combined
hash values which all seem to work pretty well indicates this IMHO.

> As an example, what algorithm should one employ in the calculation of
> the hash value of an immutable class A containing three immutable
> instance variables @a, @b, @c that contain, respectively, a String, a
> Fixnum, and a Symbol and that are all used in the calculation of #=3D=3D?

In this case I would choose the most straightforward solution: XOR of
all member hash codes.  This is sufficient since you indicated that no
value can show up in two different fields.

> The semi-standard solutions seem to be
>
> class A
> =A0def hash
> =A0 =A0@a ^ @b ^ @c
> =A0end
> end
>
> and
>
> class A
> =A0def hash
> =A0 =A0[@a, @b, @c].hash
> =A0end
> end
>
> These are the two main implementations in the Standard Library,
> anyway. =A0The first is also the solution proposed by Robert Klemme in
>
> http://blog.rubybestpractices.com/posts/rklemme/018-Complete_Class.html

And it does work indeed.  But keep in mind that the article is not a
specific article about calculating hash codes but rather about what
has to be done to make a class "complete".  Hash codes are just one
aspect.

> I would claim that the algorithm should take the class of the object
> into account as well, both for consistency with #=3D=3D (which should
> check equality of the classes of the objects being compared) and for
> added entropy.

Considering the most common case that all keys in a Hash are of the
same class it is questionable whether this will really add entropy
since then for all instances the class's hash will be the same.  You
pay a price for additional calculation though.

> Internally, Ruby (primarily) uses three C functions for the
> calculation of combined hash values, namely rb_hash_start,
> rb_hash_uint, and rb_hash_end. =A0As an example, the hash value of a
> Struct is calculated (in Ruby with these three functions wrapped in an
> imaginary module C) as
>
> class Struct
> =A0def hash
> =A0 =A0C.rb_hash_end(reduce(C.rb_hash_start(self.class.hash)){ |h, v|
> C.rb_hash_uint(h, v.hash) })
> =A0end
> end
>
> Might it be useful to have Ruby expose a way to perform this
> calculation from the Ruby realm so that other classes may employ this
> algorithm?

Not sure whether we would really gain that much.  Those calls are
efficient in C but if you provide that mechanism in Ruby land you will
have multiple calls, e.g.

def hash
  h =3D Fixnum::HASH_START
  h =3D h.combine_hash(@a)
  h =3D h.combine_hash(@b)
  h =3D h.combine_hash(@c)
end

It may turn out to be more efficient to just do

def hash
  [@a, @b, @c].hash
end

or simply use instances of Struct generated classes and reuse their
implementation.

Requirements for a hash function are pretty clear (and pretty much standard=
):

1. It should make it as unlikely as possible that two objects which
are not equivalent have the same hash value.
2. Calculation should be as cheap (in terms of CPU cycles) as possible.

http://en.wikipedia.org/wiki/Hash_function#Properties has more detail

As you can see these are no hard numeric or formal requirements and
they both have potential to contradict each other.  So it depends on
the situation what algorithm one chooses.  It is always a trade off
between these two goals and there is no single solution which is best
under all circumstances.

I have attached a stats generator for existing Hashes.  So anybody
interested can play with it and find out the distribution of hash
values for a given Hash.

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
IyBTdGF0aXN0aWNzIG9mIGhhc2ggdmFsdWUgZGlzdHJpYnV0aW9uIGluCiMgYSBnaXZlbiBIYXNo
LgpIYXNoU3RhdCA9IFN0cnVjdC5uZXcgOm1pbiwgOm1heCwgOmF2ZyBkbwogIGRlZiBzZWxmLnN0
YXRzKGFfaGFzaCkKICAgIGNudCA9IEhhc2gubmV3IDAKICAgIG1heCA9IDAKCiAgICBhX2hhc2gu
a2V5cy5lYWNoIGRvIHxrfAogICAgICBjbnRbay5oYXNoXSArPSAxCiAgICBlbmQKCiAgICBuZXcg
KmNudC52YWx1ZXMubWlubWF4LCBhX2hhc2guc2l6ZS50b19mIC8gY250LnNpemUKICBlbmQKZW5k
Cgo=