"Christoph Rippel" <crippel / primenet.com> wrote:
>
> > From: Ben Tilly [mailto:ben_tilly / hotmail.com]
[...]
>I also think that your objections towards symmetrization - i.e. swapping
>are not valid since they cost very little compared to the reminder division
>invoked in any hashing scheme and running the interpreter it self of course
>(in my test it makes virtually non difference if you swap or don't swap -
>even so swapping does not gain anything in the ``examples'')
>- swapping IMO simply feels more robust.

I hate the phrase "simply feels more robust".  And my
reaction is so strongly opposite that I want to explain
the cause of my reaction in some detail.

First of all it is an unnecessary step.  We are both
agreed on that.  The algorithm works just fine without
it.

Secondly let us stop and ask ourselves what we want == to
mean for complex data structures.  A reasonable definition
is that two data structures are == when there is no nested
set of lookups that you can find which give apparently
different results.  So, for instance if we take this code:

   a = ["foo"]
   a.push a

then a is == to ["foo", a] which is == to ["foo", ["foo", a]]
and in general if you try to access an element in any of them
you get the same result as from the infinite data structure:

    ["foo", ["foo", ["foo", ... ]]]

I maintain that this definition is both reasonable and
understandable.

Now it is not hard for me to come up with a convincing
demonstration that caching without swapping is always,
no matter how complex the data structure, going to result
in a correct algorithm.  But I had more trouble producing
a demonstration for the general case with swapping where
you might have 10 variables that got compared with each
other, which have a variety of recursive relationships
and appear on both sides of the comparison.

It turns out that there is a simple proof, but it was not
as easy for me to produce it.  Conversely since both
algorithms accomplish the same thing, there is no way
that adding the swap can make your code more robust.

So we have a step that is not needed, that makes the
code harder to analyze, that does not change what
the end result is.  There is no case that will work out
which would not have have worked out anyways.  Thus it
cannot make the code more robust.  It can only make the
code more complex to understand.  And in my books, all
else being equal, complexity is the dire enemy of
robustness.

Sure, it is only one line.  But what does that line
accomplish?  What purpose does it serve?  How does it
justify its continued existence?

(I know, my math background is showing...)

Cheers,
Ben

PS I don't mind complications if they buy you something.
Linear searches are far simpler than hashing, BTrees,
etc, but they are also much slower.  But unless a
complication actually wins me something specific, I try
to avoid it.  And while it is definitely true that there
are good reasons to use these more complex algorithms, it
is also undeniable that many bugs have been created by
people trying to switch to more complex algorithms...
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com