--ikeVEW9yuYc//A+q
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

// Apologies for the delayed reply =E2=80=93 it takes
// a bit to digest such a detailed response! :)

Cezary:

> I'm trying to get an idea of how the implementation decisions behind
> hashes affect the general use of hashes in Ruby and if something
> could be slightly changed in favor improving the user's experience
> with the language without too much sacrifice in other areas.

Right, and that=E2=80=99s a good direction. I=E2=80=99m all for having a ne=
w class
tailored to hold name =E2=86=92 object mappings, I=E2=80=99m just very happ=
y with
the current Hash class as a tool for object =E2=86=92 object mappings (and
for being the base of Set, but that=E2=80=99s an implementation detail).

> I believe Hash was designed with efficiency and speed in mind and
> the recent Hash syntax changes suggest that all the current ways
> people use Hash in Ruby is way beyond scope of the original concept.

I=E2=80=99d say the Hash syntax extensions in Ruby 1.9 are there because
Hashes turned out to be great containers for name =E2=86=92 object mappings,
but I=E2=80=99d be wary to jump to the conclusions that we need to make the
Hashes themselves be more aware of their (most?) common application,
especially when it means they=E2=80=99d be much more complicated and have to
special-case certain classes of their keys.

(I agree that they=E2=80=99re already somewhat-aware of this by
freezing the Strings used for keys, but that=E2=80=99s mostly because
String#hash does not need to be recomputed for frozen Strings
and it does not really add any magic to the general ideas
behind the underlying object =E2=86=92 object mapping model.)

> Refinements may minimize the need for changes here, but even still,
> I think this is a good time to consider what Hash is used for and how
> syntax changes can help users better express their ideas instead of
> just being able to choose only between an array, a very, very general
> associative array or 3rd party gems that have no syntax support.

Agreed. Come up with a NameHash implementation, maybe even teach
Hash#initialize to return NameHash instances when all of the
keys happen to be names (Strings/Symbols/nil?) if you want to
piggy-back on the nice {=E2=80=A6} Hash syntax (although this is a bit
too much magic for my tastes) and we can discuss the details. :)

(Also, consider whether you=E2=80=99re into name-key Hashes
or a bit more general case of Hashes which all keys
the same class =E2=80=93 and which one will be more useful.)

> But since Hash uses a hash table, it is possible to have a wider
> range of key types, including both symbol and string together.
> The implementation allows it, but my question is: is it *that*
> useful in the real world? Or does it cause more harm than good?

I=E2=80=99d argue it is useful in that it=E2=80=99s a very simple model (I =
know
the very few rules about #hash and #eql? and I don=E2=80=99t have to
know, nor remember, any type-specific magic) which can be used
as the foundation for a lot of use-cases (I know I sound like
a broken record, but heterogenic Sets are quite useful).

Also, Ruby is not known for treating the =E2=80=98does it cause more harm
than good=E2=80=99 question as a benchmark; quite a few dynamic features
that make Ruby what it is would probably fail this test, but the
language=E2=80=99s philosophy is more towards =E2=80=98you=E2=80=99re a gro=
wn-up and we
hope you know what you=E2=80=99re doing when you play with powerful tools=
=E2=80=99.

> I think people expect hash keys to match
> a given domain to consider them valid.

Yes, but the domain is usually specific, and I don=E2=80=99t think
enforcing any parts of it on all Hashes is a good idea.

>> Hm, IMHO =E2=80=98any object can be a key, just as any object can be
>> a value=E2=80=99 is the general case, and =E2=80=98I want my Strings and=
 Symbols
>> to be treated the same when they=E2=80=99re similar, oh, and maybe with
>> the nil handled separately for convenience=E2=80=99 is the specialised c=
ase.

> Exactly. The specialized case is obviously bad. But the general case
> turned out not to be too great. I am thinking about third solution:
> generic, but within a specified domain - ideally were the differences
> between string and symbol stop them from unintentionally being in the
> same Hash without being too specialized. And without subclassing.

This is where we disagree =E2=80=93 I much prefer the current Hash as the
general case (and I think it is great), and your case as specialised
to the point of either having its own Hash-like class or similar
keys returning the same #hash values so that they=E2=80=99re treated the
same (maybe that=E2=80=99s what you want? 'abc'.hash =3D=3D :abc.hash when =
used
in certain contexts? but that=E2=80=99d be even bigger a hack, IMHO).

> Even by just a warning that is emitted when a Hash becomes
> unsortable, we are not breaking the association array concept
> while *still* supporting 99% or more actual real world use cases.

If I understand your concept of =E2=80=98sortableness=E2=80=99 right,
implementing this warning means you=E2=80=99d have to check this
very often, as I can undefine #<=3D> on any Hash key at a whim.

> As a side effect, if a user writes {'foo': 123}.merge('foo' =3D>
> 456), they will get a warning instead of just a hash with two pairs.

You could implement this by monkey-patching Hash#merge (to do
#to_sym comparison with existing keys, perhaps?), but it=E2=80=99d be
a performance hit hardly justifiable for a generic Hash class.

> Such a warning most likely will help find design flaws
> and make difficult to debug errors less often when
> refactoring. And hopefully encourage a better design
> or just think a little more about the current one.

Agreed. Why are you against subclassing Hash
and coming up with a NameHash (or MonoKeyHash)?

> Users generally care only about their string->symbol problems
> until they realize that using strings for keys is generally
> not a good thing because of problems and debugging time.

> Implementation wise I think Hash is great. However, the flexibility
> along with symbol/string similarities and more ingenious uses of
> Hash will probably cause only more problems over time.

Again, while agreeing with both of the above, I still
think coming up with NameHash is a much better solution
than trying to make Hash outsmart the programmer.

> Python doesn't have symbols and has named arguments. In Ruby we use
> a symbol keyed Hash to simulate the latter which is great, but if
> the hash is not symbol key based, there is no quick, standard way
> to handle that. Sure, you can ignore or raise or convert, but why
> handle something you should be able to prevent?

The problem is that Ruby internals are much more about Objects
that happen to have certain similarities with each other (such
as responding to certain methods) than about Strings, Symbols
and so on. I agree that in 99% of the cases all Strings share
the same methods, but changing fundamental classes (like Hash)
unfortunately is all about handling the edge cases.

The discussion about warning a sloppy developer is similar to
whether '1' + 2 should work, and if so, whether it should be
3 or '12'. Note that Rails monkey-patches NilClass to make the
errors on nil.<method> more obvious; maybe that=E2=80=99s the way to go?

> Ignoring keys you don't know seems like a good idea, but the result
> is not very helpful in debugging obscure error messages. And lets
> face it: most of the Ruby code people work on is not their own.

> The only people who don't need to care are the experts
> who already have the right habits and understanding
> that allows them to avoid problems without too much
> thought. The rest have to learn the hard way.

The Ruby approach in this case is to have enough test coverage
(ideally: upfront) so that the problem is quite obvious. ;)

I understand what you mean by the =E2=80=98experts=E2=80=99 remark, but I=
=E2=80=99m not sure
that this case falls on the =E2=80=98expert=E2=80=99 side of the border; un=
derstanding
how Hashes work is quite crucial (as are the differences between #=3D=3D,
#=3D=3D=3D, #eql? and #equal?, or that everything except nil and false is
falsy). I think what I=E2=80=99m affraid of are changes that would lead Ruby
in the PHP direction, which second-guesses the programmer so much that
"61529519452809720693702583126814" =3D=3D "61529519452809720000000000000000"
(or that empty() call on a "0" string returns true).

>> Hashes in Ruby serve a lot of purposes (they even
>> maintain insertion order); if you want to limit
>> their functionality, feel free to subclass.

> Why do I have to subclass Hash to get a useful named arguments
> equivalent in Ruby? Why would I want object instances for argument
> names? Why can't I choose *not* to have them in a simple way?

Well, you want a particular kind of a Hash, where you want
certain Objects (that are often used as names) to be treated
with additional diligence. You need objects instances for
argument names because most of the things in Ruby are Objects
(and IMHO names definitely should be objects).

> The overhead and effort required to maintain and use a subclass
> becomes a good enough reason to give up on writing robust code.

> Which is probably what most rubists do.

Meh. I think most Rubyists are simply quite ok
with using Hashes for name =E2=86=92 object mappings. :)

> We have RBTree and HashWithIndifferentAccess. Neither really
> helps in creating good APIs for many of the wrong reasons:

> - HWIA is for Rails specific cases but is usually
>   abused to avoid costly string/symbol mistakes

But are the mistakes really that common? It should be doable
to add guarding code to Hash#initialize if it=E2=80=99s really needed.
You could also argue for getting Rails=E2=80=99 HWIA into Ruby core
(I=E2=80=99m not sure whether it was proposed before or not).

> - RBTree is a gem most people don't know about and stick
>   with Hash anyway. It adds an ordering requirement but that
>   seems like a side effect. It was proposed to be added in
>   Ruby 1.9, but I don't remember why it ultimately didn't

Might=E2=80=99ve been good reasons (I don=E2=80=99t remember them either). =
:)

> - the {} notation is too convenient to lose in the case of
>   subclassing, especially when Hash is used for method parameters

Then make Hash#initialize smarter if you need.

> - in practice, you can only use the subclass in your own code

Well, if you come up with a practical NameHash
then I think there=E2=80=99s a chance it=E2=80=99ll end up in core.

> If Hash changed its behavior in the way described, most of
> the existing code would work as usual. Manually replacing
> {} with a subclass in a large project is a waste of time.
> Hashes are used too often to even consider subclassing.

I=E2=80=99d argue that =E2=80=98most of the existing code would work
as usual=E2=80=99 and =E2=80=98Hashes are a bit contradictory. Hashes
are so pervasive because they=E2=80=99re so convenient; IMHO
changing how they would would break quite a lot of code.

As for the {} syntax: (a) NameHash could have its own
syntax sugar or (b) as I wrote above, you can try abusing
Hash#initialize to create NameHash if all the keys are
Strings/Symbols (but I can see this blowing up eventually).

> Consider regular expressions: you can specify options to a regexp,
> defining its behavior. Having the same for hashes could be cool:

> {'a' =3D> 3, :a =3D> 3}/so  # s =3D strict, o =3D ordered

> As examples, we could also have:

> r =3D uses RBTree for the Hash (and so implies 's')

> i =3D indifferent access, but not recommended (actually,
> I personally wouldn't want this as an option)

Hm, I=E2=80=99d rather have Hash#/ as a valid method (say, for
splitting Hashes into shards?) than a syntax construct.

Interestingly, regular expressions reminded me of that fact
that it might be convenient to have both Regexps and Strings
as keys in the same Hash (for matching purposes). :)

>> How would you treat subclasses? Let=E2=80=99s say I have a Hash with
>> keys being instances of People, Employees and Volunteers (with
>> Employees ans Volunteers being subclasses of People). Should
>> they all be allowed as keys in a single MonoKeyHash or not?

> Good example of using a Hash to associate
> values with (even random) objects!

> Since having keys orderable already answers
> the part about allowing into the Hash,

I=E2=80=99m still not sure what you mean by that (and
what happens if I remove #<=3D> from a random key).

> I'll concentrate on the case where items are of different types.

> How about an array of objects and a hash of object id's instead?

>   [ person1, person2, ...]
>   { person1.object_id =3D> some_value, ... }

> Or just use the results of #hash as the keys if it is about
> object contents. This makes your intention more explicit.

>   { person1.hash =3D> some_value, ... }

> If you really need different types as a way of associating
> values with random objects, you could create a Hash of
> types and each type would have object instances:

> {
>   Fixnum =3D> { 1 =3D> "one", 2 =3D> "two" },
>   String =3D> { "1" =3D> "one", "2" =3D> "two" },
> }

> Then you can use hash[some_key.class][some_key] for
> access if you *really* need the current behavior.

No, I definitely want Objects to be keys in my Hashes; the trivial
example is any kind of graph and relation modelling. Say, I have
a graph with nodes being various subclasses of Person and various
subclasses of Event and I want the graph to track Person/Person
and Person/Event relations =E2=80=93 I really want to be able to use the
various People and Event subclasses as keys in my Hash.

> #hash and #eql? are called by Hash internally - if there
> is a good reason for redefining these, there is probably
> a good way to do it without relying on Hash internals.

Hm, I think I totally disagree =E2=80=93 #hash and #eql? are
the public interface of Hash, and the contract is that
anything that implements these can be used as a Hash key.

> If for some fictional reason Ruby used an rbtree internally
> for Hash, #<=3D> would be used instead of #hash + #eql. Everything
> else would be the same except for allowed key values.

Ok, but then you change the Hash API (from #hash + #eql? to #<=3D>).
#<=3D>-based NameHash would make perfect sense (and you could then
experiment with 'abc' <=3D> :abc returning 0), but the fundamental
reason behind the current Hash implementation is that #hash doesn=E2=80=99t
have to return different values for objects that are not #eql?

This allows you to come up with a very fast #hash implementation
that needs to be =E2=80=98right=E2=80=99 only most of the time and then Hash
will fall back to checking whether a.eql? =3D=3D b.eql? only if
a.hash =3D=3D b.hash; for example, if your Hash keys are very long
Strings that happen to almost always differ and when they do
they differ on the first couple of characters, you can optimise
their #hash method by making it consider only the first couple
of characters =E2=80=93 the (comparatively slow) #eql? method will then
be called only rarely, when the Strings are most probably the same
anyway (but in such cases you need to scan them in full anyway).

> Novice users find symbols, strings and Hashes complicated and
> confusing. Changing this is my focus here. A complex model that
> is easily discoverable is probably better than a simple model
> that requires complex solutions from the users to do a great job.

I strongly disagree here; a simple model which, in addition, is fairly
easy to explain (it=E2=80=99s only #hash and #eql?, really), is much better
than a complex model carried around only for the sake of novices.

I see where PHP ended up with =E2=80=98novice-friendly=E2=80=99 approach and
it=E2=80=99s awful =E2=80=93 and I strongly believe a simple and consistent
model is actually more novice-friendly in the long run than
wondering why '0' =3D=3D false and false =3D=3D null but '0' !=3D null.

> Even if it results in an overly complex parser and
> implementation, I think only good will come from going
> out of one's way to make Ruby users lives easier.

Definitely =E2=80=93 it=E2=80=99s just we disagree on what is easier (in th=
e long
run). I agree it might be useful to have NameHash for name =E2=86=92 object
mappings, MonoKeyHash that keeps the keys in check and/or a #<=3D>-based
ComparisonHash and I encourage you to implement them and push for them
to be included in the core; I=E2=80=99m simply very grateful for the extrem=
ely
well though-out and versatile Hash we have now and I=E2=80=99d rather it=E2=
=80=99s
not made more complicated (or dumbed-down) for the sake of a (granted,
popular) single use-case.

> Which is why I really appreciate your input and for giving me
> the motivation to understand the topic and Ruby internals better.

I really like this discussion as well! Thanks for bringing this up.

=E2=80=94 Piotr Szotkowski
--=20
7.times{k=3D0;puts ($*.map!{|i|k+k=3Di}<<1)*" "}




--ikeVEW9yuYc//A+q
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAk3pFx0ACgkQi/mCfdEo8Uq0bgCgnuR4YJRT3FkA4dmo+X9rad0H
S0UAn3+Zql1tHXGmBUs1xmqSwbkjmRaM
=/jYp
-----END PGP SIGNATURE-----

--ikeVEW9yuYc//A+q--