On Sat, Dec 31, 2011 at 3:16 PM, Nikolai Weibull <now / bitwi.se> wrote:
> On Sat, Dec 31, 2011 at 13:58, Robert Klemme <shortcutter / googlemail.com>=
 wrote:
>> On Fri, Dec 30, 2011 at 3:43 PM, Nikolai Weibull <now / bitwi.se> wrote:
>
>>>> But still, I don't see the need. =A0Note also that a proper Hash key
>>>> usually should be immutable because changing them causes all sorts of
>>>> trouble if not done carefully.
>>>
>>> Hence the use of =93value object=94 in my question.
>
>> I can't see that "value object" implies "immutable".
>>
>> http://c2.com/cgi/wiki?ValueObject
>> http://en.wikipedia.org/wiki/Value_object
>
> Continue reading:
>
> http://c2.com/cgi/wiki?ValueObjectsShouldBeImmutable

Ah, OK.  Still it's not a "must".

>>> Second, let me rephrase my
>>> question and add some additional context and examples:
>>>
>>> What algorithm should one employ in the calculation of the hash value
>>> of an arbitrary value object?
>
>> There is no single standard (or best) way. =A0The fact that different
>> languages (Java, Ruby...) have different means to calculate combined
>> hash values which all seem to work pretty well indicates this IMHO.
>
> Really? =A0Everyone seems to use the XOR method (with good cause).

No, not the simple "XOR all hash codes" method.  Consider
java.util.AbstractList<E>

    public int hashCode() {
	int hashCode =3D 1;
	Iterator<E> i =3D iterator();
	while (i.hasNext()) {
	    E obj =3D i.next();
	    hashCode =3D 31*hashCode + (obj=3D=3Dnull ? 0 : obj.hashCode());
	}
	return hashCode;
    }

> As I=92ve already pointed out, internally, Ruby does something
> completely different.

It's bit manipulations as well as far as I can see - but more complex
than the simple XOR all or the Java version.

>>> I would claim that the algorithm should take the class of the object
>>> into account as well, both for consistency with #=3D=3D (which should
>>> check equality of the classes of the objects being compared) and for
>>> added entropy.
>
>> You pay a price for additional calculation though.
>
> Since the fields are immutable, the result of the calculation can be
> cached, so that=92s not a valid reason to exclude it.

You can only cache it if the object is frozen.  And the question still
remains to be answered whether there is a significant gain by having
the class's hash included.

>>> Internally, Ruby (primarily) uses three C functions for the
>>> calculation of combined hash values, namely rb_hash_start,
>>> rb_hash_uint, and rb_hash_end. =A0As an example, the hash value of a
>>> Struct is calculated (in Ruby with these three functions wrapped in an
>>> imaginary module C) as
>>>
>>> class Struct
>>> =A0def hash
>>> =A0 =A0C.rb_hash_end(reduce(C.rb_hash_start(self.class.hash)){ |h, v|
>>> C.rb_hash_uint(h, v.hash) })
>>> =A0end
>>> end
>>>
>>> Might it be useful to have Ruby expose a way to perform this
>>> calculation from the Ruby realm so that other classes may employ this
>>> algorithm?
>
>> Not sure whether we would really gain that much. =A0Those calls are
>> efficient in C but if you provide that mechanism in Ruby land you will
>> have multiple calls, e.g.
>>
>> def hash
>> =A0h =3D Fixnum::HASH_START
>> =A0h =3D h.combine_hash(@a)
>> =A0h =3D h.combine_hash(@b)
>> =A0h =3D h.combine_hash(@c)
>> end
>
> I don=92t understand what you=92re getting at with this example. =A0It
> doesn=92t seem to add anything to the discussion.

It demonstrates that whatever would be exposed to Ruby land would need
multiple method calls to avoid object creation overhead.  If you are
willing to pay that price you can just use [@a,@b,@c].hash.

> =A0My example code,
> which shows how Ruby does it internally for Struct, makes multiple
> calls. =A0Since these methods would be simple wrappers of the C
> functions, the hash calculation would (almost) be as fast as it would
> be for Struct.

You still have some overhead.  And then it might be more efficient to
use Array's implementation.  It's certainly simpler.

I still cannot see why there should be a "standard way of implementing
#hash for value objects".  We have Struct's way and Array's way and we
can use other approaches like XOR all.  Why would a standard make
things better?

Cheers

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/