On Sat, Jun 04, 2011 at 02:17:28AM +0900, Piotr Szotkowski wrote: > // Apologies for the delayed reply it takes > // a bit to digest such a detailed response! :) Oh, don't apologize - my fault for being way too elaborate and taking so much of your time. The topic got me really thinking on some concepts. Here is an overview: 1. Regarding coding issues only, still I don't see the difference between Hash and RBTree as feature. I don't see #hash +#eql? as being superior in this regard than #<=>. Hash API is YAGNI category for users, if you ask me. RBTree is a good reference on how a hash can work with #<=>. (RBTree wasn't included because it wasn't mature enough at the time). 2. I patched Ruby to warn about cases where key type mixing takes place. The cases that popped up didn't justify the need for Hash's generic behavior (though I only checked a few things). The result of this "experiment" however convinced me that adding "Ruby best practice" warnings is both valuable and easy. If only it were easier to turn them on and off in code... The interesting conclusion is that such changes don't have to be in the standard MRI to be useful. There could even be a patched "lint" version of Ruby, that could warn about not using the short hand hash syntax. 3. Just for reference: maybe I didn't make myself clear, but the last thing I want is to have symbols compared to strings. HWIA handles a Rails specific case for convenience, so it doesn't count. The idea that '3' + 3 doesn't work is something I find very useful. Likewise, if an integer keyed hash didn't merge with a string keyed hash I would find such a case very similar. The PHP way would be to discover that "Array is a special case of Hash (implementation aside), with integers as keys, so not why create a general array/hash class, call it array and have one class less for novices?" That didn't turn out nice IMHO. The PHP way would be to drop symbols because they are too difficult to grasp - or make them coerce to one another, which is probably worse. As an analogy, my approach with making Hash more strict seems to be like making the hurdles more difficult to jump over, but pasting on them instructions about what you need to learn to clear them. Hardly the PHP approach if you ask me. So, from my point of view, slightly putting the generalized behavior "out of view" (but not out of reach) would get people to sit back and think more about design, and not just reach for what they know. Much better conditions for learning than debugging. > I¡Çd argue it is useful in that it¡Çs a very simple model By contrast, RBTree also seems simple - at least to me. Although the name doesn't suggest how similar it is to Hash. > Also, Ruby is not known for treating the ¡Ædoes it cause more harm > than good¡Ç question as a benchmark True. With so many interesting languages popping up, syntax will probably become more important for Ruby's success in the future. We already have two very successful Rubies: 1.8.7 and 1.9.2. With such a long history already, it is now easier to make harder decisions about the language and syntax with less risk. > Yes, but the domain is usually specific, and I don¡Çt think > enforcing any parts of it on all Hashes is a good idea. I thought so too and I'm not saying it definitely is - but I still cannot think of practical reasons why. > (maybe that¡Çs what you want? 'abc'.hash == :abc.hash when used > in certain contexts? but that¡Çd be even bigger a hack, IMHO). No, that is the case I would like to prevent from occurring! I'm guessing a lack of #<=> could be worked around by using #hash, #eql? and using object_id to determine order predictably. But I didn't really think this through and I haven't looked that deeply into RBTree. > (...) as I can undefine #<=> on any Hash key at a whim. Not sure what you mean. You can undefine #hash also. Breaking things is ok, as long as fixing them is quick and simple IMHO. Unit tests are for great from keeping broken things from leaving one's file system. > Why are you against subclassing Hash and coming up with a NameHash > (or MonoKeyHash)? It still seems like treating just the symptom. And suggests too much duplication - the differences are just slight behavior differences. And it feels to Java'ish for Ruby. Maybe I feel like subclassing Hash is more work than it should be. Consider the following as alternatives from a design perspective: # filter (ignore garbage) + sort, convert from array Hash[{z:0, a:1, 'b' =>2}.select {|x| x.is_a?(Symbol)}.sort] => {:a=>1, :z=>0} # filter (validate) + sort, convert from array Hash[{z:0, a:1, 'b' =>2}.each {|k,_| raise ArgumentError unless k.is_a?(Symbol); k}.sort] #=> ArgumentError And the following: # no filtering, always sorted, no invalid state, remains RBTree RBTree[{z:0, a:1, 'b' =>2}] #=> ArgumentError I'd probably prefer mixins that can be included in Hash. But I'm unsure how that would turn out. Again, refinements come to mind, but I wonder if the current API is easily ... "refinable". And maybe allow for optimizations. Here is an example of what I mean: a = {}.add_option(inserting: {order: :sort_key, duplicates: :raise}) .set_option(default: 'X') .add_option(inserting: {|a| !a.is_a? Symbol} => :raise) a.merge(z: 3, a: 1) #=> {a:1, z:3} a.merge(z: 3, a: 1).sort #=> {a:1, z:3} (no sorting required) a.keys.to_a.sort #=> [:a, :z] (no sorting required) a[:foo] #=> 'X', works like block given to Hash a['foo'] #=> raises an ArgumentError [].add_option(inserting: {duplicates: :merge}) #=> effectively a Set Refinements would minimize the need for this. The only problem I can see now with Ruby API is that people want to override behavior and not methods - this makes subclassing more difficult than it should be IMHO. For example #[], #[]= and merge can add items, but you cannot just override 'add_item' (st_insert() I believe). > Again, while agreeing with both of the above, I still > think coming up with NameHash is a much better solution > than trying to make Hash outsmart the programmer. If I could do {a: 3}.to_symhash I guess that would work out ok. > I agree that in 99% of the cases all Strings share > the same methods, but changing fundamental classes (like Hash) > unfortunately is all about handling the edge cases. If I had more control over what can be in a hash, I have a lot less edge cases to worry about. Same with other types. > The discussion about warning a sloppy developer is similar to > whether '1' + 2 should work, and if so, whether it should be > 3 or '12'. These examples are obvious errors. For Hash compatibility I proposed just a warning or make Hash mixing deprecated. But that assumes restricting Hash is actually valuable - which I am unsure of. > Note that Rails monkey-patches NilClass to make the errors on > nil.<method> more obvious; maybe that¡Çs the way to go? It is why I preferred to hack rb_hash instead of subclassing. Simple task and handles internal calls to rb_hash as well. > The Ruby approach in this case is to have enough test coverage > (ideally: upfront) so that the problem is quite obvious. ;) Aggressive TDD is how I learned root cause analysis (I hope). Adding a touch of Design by Contract may help reduce some unnecessary edge cases without resorting to too much intelligence. > I understand what you mean by the ¡Æexperts¡Ç remark, but I¡Çm not sure > that this case falls on the ¡Æexpert¡Ç side of the border; understanding > how Hashes work is quite crucial Sure, but not necessarily on the first page of a Ruby tutorial. With disciplined TDD you get actually quite far IMHO without understanding details. Refactoring is actually a good time for learning such things. And warnings are a good way to focus more deeply on a given subject. > Well, you want a particular kind of a Hash More like just a particular behavior, but I'm otherwise nodding my head reading your comments. > But are the mistakes really that common? It should be doable > to add guarding code to Hash#initialize if it¡Çs really needed. > You could also argue for getting Rails¡Ç HWIA into Ruby core > (I¡Çm not sure whether it was proposed before or not). Yes it was. The reasoning behind arguments for including suggested exactly that - that mistakes are common. > Then make Hash#initialize smarter if you need. Hash already has a block for default values. If I could define a block called for every implicitly added item and have the block working with #merge, it might be a good solution. >> a = Hash.new {|_, key| raise unless key.is_a?(Symbol)} >> a['a'] #=> RuntimeError >> a.merge(3 => 4) #=> {3=>4} (no error) > Well, if you come up with a practical NameHash > then I think there¡Çs a chance it¡Çll end up in core. It becomes more practical once it is in core ;) Chicken and egg problem. Proving it *is* practical may be a problem. Proving it wouldn't be the easiest - if I knew how. And it would result in much shorter threads on ruby-core... > As for the {} syntax: (a) NameHash could have its own > syntax sugar I has too much in common with Hash - the {} syntax is one of the reasons I started considering replacing Hash. As for alternatives, what is left? ('a' => 3), %h{a: 3}, ... ? > or (b) as I wrote above, you can try abusing Hash#initialize to > create NameHash if all the keys are Strings/Symbols (but I can see > this blowing up eventually). Actually blowing up (if I understand you correctly) is better than silent failure and long hours of debugging through a great big metaprogramming jungle. I don't want the extreme of making Ruby interpret code a mind bending puzzle challenge (those experienced in strongly typed languages may find this familiar), but on the other side - I don't think Ruby has reached the sweet spot yet. > > {'a' => 3, :a => 3}/so # s = strict, o = ordered > Hm, I¡Çd rather have Hash#/ as a valid method (say, for > splitting Hashes into shards?) than a syntax construct. Sure. That was just random brainstorming - but I don't really like it myself. It is too specific. But then again - hashes and arrays won't change dramatically over time. > Interestingly, regular expressions reminded me of that fact > that it might be convenient to have both Regexps and Strings > as keys in the same Hash (for matching purposes). :) That sounds crazy but you have a point. I wonder if after 10 years of abusing hashes people will reinvent LISP as a result. Or everyone will be configuring their favorite Ruby syntax upon installation. > >> Let¡Çs say I have a Hash with keys being instances of People, > >> Employees and Volunteers (with Employees and Volunteers being > >> subclasses of People). Should they all be allowed as keys in a > >> single MonoKeyHash or not? I'm not sure about the actual use case, so try it with RBTree and see for yourself. > I¡Çm still not sure what you mean by that (and what happens if I > remove #<=> from a random key). Not sure here too. Try it with RBTree. > Say, I have a graph with nodes being various subclasses of Person > and various subclasses of Event and I want the graph to track > Person/Person and Person/Event relations I really want to be able > to use the various People and Event subclasses as keys in my Hash. Yes, but why in the same hash? Why not two different hashes? What is the common behavior between Event and Person? You are probably going to iterate the graph in order to ... ? You can always add a level of indirection, then your graph will become more generic and reusable. And effectively, you are hashing object contents - I'm not sure that is really what you want. My intuition tells me such the case you describe is refactorable. > Hm, I think I totally disagree #hash and #eql? are > the public interface of Hash, and the contract is that > anything that implements these can be used as a Hash key. True, but I was referring to a higher level of abstraction of an assoc array, which Hash is intended for (but not limited to): a[b] = x (association) a[b] (referencing) At this level, both Hash, RBTree and even Array are identical. #hash and #eql? are assoc array implementation specific. RBTree doesn't use hashing, but serves the same purpose. The difference is the implementation restricts the items available for keys. > I strongly disagree here; a simple model which, in addition, is fairly > easy to explain (it¡Çs only #hash and #eql?, really), is much better > than a complex model carried around only for the sake of novices. Could you say what exactly is complex? I always thought an RBTree was simpler to understand than "Hash", which to me initially worked "magically" and sometimes I still get hashes and identities mixed up. By analogy, an even more stricter "hash" - Array - is even less confusing: a = [1,2,3] a[0] # => 1 >> a[nil] # => TypeError (!) And it is limited to integers specifically. And I don't have to run 'ri' or look into array.c to work it out. We could discuss if a[nil] fails for common sense reasons or implementation reasons. Maybe in the same way I'm not getting that in practice, association arrays are always hash based and it is obvious for everyone but me. > I see where PHP ended up with ¡Ænovice-friendly¡Ç approach and > it¡Çs awful I totally agree. > and I strongly believe a simple and consistent model is actually > more novice-friendly in the long run than wondering why '0' == false > and false == null but '0' != null. Handling these cases says it all (isset, isnull, etc). For me PHP is both incredibly difficult to learn and even take. The only hope for PHP at this point would be to start undoing "novice helping" and start generating errors and warnings. In case of double, a clear common syntax is the best criteria if you ask me. With an error or warning you at least have a question to start with. I'm not sure what you mean by model and how in what way it is novice-friendly. Do you mean easier to understand implementation? If personally think that clear syntax wins in the long run. How intent maps to code. The underlying model can change many time and be as complicated as possible and I wouldn't really care. Probably because I spend more time in Ruby and none in C. > > Even if it results in an overly complex parser and implementation, > > I think only good will come from going out of one's way to make > > Ruby users lives easier. > > Definitely it¡Çs just we disagree on what is easier (in the long > run). I'll be quick to correct myself: easier for users to become productive and happy (and rich?) experts delivering valuable software. > I agree it might be useful to have NameHash for name ¢ª object > mappings, I would just stick with SymbolHash and have Hash for strings. > MonoKeyHash that keeps the keys in check This would probably be a copy of rb_hash implementation, where type checking is both simple and cheap. Not sure how to handle objects though. > and/or a #<=>-based ComparisonHash meaning basically to help RBTree get adopted - maybe with a nicer, less scary name > and I encourage you to implement them and push for them > to be included in the core I still think I lack the necessary understanding, so I'll spend some more time researching actual hash usage along with external libraries in this area (AS, facets, extlib). > I¡Çm simply very grateful for the extremely well though-out and > versatile Hash we have now and I¡Çd rather it¡Çs not made more > complicated (or dumbed-down) for the sake of a (granted, popular) > single use-case. Could you explain that using Array and RBTree as examples? Is Array a dumbed-down hash? Is RBTree overcomplicated? > I really like this discussion as well! Thanks for bringing this up. Thanks for your time. Thanks to you I made a lot of new distinctions between Ruby core concepts! Initially I wanted to contribute, but I ended with just increasing my own knowledge for now. If I find some interesting patterns how Hash (or Ruby in general) is (ab)used, I'll post them as a new thread with possible ways Ruby could help simplify/fix things. P.S. Looking at my reply ... I'm sure even the mail server deserves a break after this. -- Cezary Baginski