This looks astounding. Quick nit: is it #categorize or #associate? I =
like #categorize
as a name for this more, but you've given code samples with #associate =
as the working
title of the method.

I really like what you've presented here.

I have one idea for how to handle varying key lengths: I think I'd like =
an option to (or the
default case to be, even, but at least an option) have a replacing, =
mixed-mode result: both
values and mixed nesting is allowed, and you can replace a value with a =
nesting level
if key duplication occurs. Example where the categories are fruits:

[  [:aaa, "plum"],
   [:aaa, :bbb, "banana"],
   [:aaa, :ccc, "lemon"],
   [:foo, :bar, "pear"],
   [:foo, "apple"],
   [:zzz, "orange" ]  ].categorize { |a| a }

should give:

 {:aaa =3D> {:bbb =3D> "banana", :ccc =3D> "lemon" },
  :foo =3D> "apple" },
  :zzz =3D> "orange" }

It's a neat way to provide a useful result, as well as to avoid the =
overhead of having to
actually check for this... though, the overhead is asymptotically =
negligible.

Just food for thought.

Cheers,
Michael Edgar
adgar / carboni.ca
http://carboni.ca/

On Mar 26, 2011, at 11:25 PM, Marc-Andre Lafortune wrote:

> Following the comments of Akira and others, here's a revised proposal
> merging his original Enumerable#categorize with my previous version.
>=20
> Like Akira's categorize, it now:
> * can produce nested hashes
> * returns an Enumerator with not given a block
>=20
> Like my original proposal, it still:
> * has a simple interface with a single argument for special merges
> * does not produce "grouped hashes" by default
>=20
> Here is what the documentation could read like:
>=20
>    enum.associate(merge =3D nil){ block } # =3D> a_hash
>=20
>    Invokes block once for each element of +enum+. Creates a new hash =
based on
>    the values returned by the block. These values are interpreted as a =
sequence
>    of keys and the final value.
>=20
>       (1..3).associate {|e| ["#{e} + #{e}", e+e] }
>         #=3D> {"1 + 1" =3D> 2, "2 + 2" =3D> 4, "3 + 3" =3D> 6}
>=20
>    If more than one key is specified, the resulting hash will be =
nested.
>=20
>       (0..7).associate {|e| [e&4, e&2, e&1, e] }
>         #=3D> {0=3D>{0=3D>{0=3D>0,
>                     1=3D>1},
>                 2=3D>{0=3D>2,
>                     1=3D>3}},
>             4=3D>{0=3D>{0=3D>4,
>                     1=3D>5},
>                 2=3D>{0=3D>6,
>                     1=3D>7}}}
>=20
>    If no key is specified, either because the block returned an array
>    with less than two elements, or because only the value is not an =
Array,
>    then the key is assumed to be the yielded element itself
>    (or the first element in case many elements are yielded):
>=20
>       (1..4).associate{|i| i ** i} # =3D> {1 =3D> 1, 2 =3D> 2, 3 =3D> =
27, 4 =3D> 256}
>       {:foo =3D> 2, :bar =3D> 3}.associate{|k, v| v ** v}
>         # =3D> {:foo =3D> 4, :bar =3D> 9}
>=20
>    In case of key duplication, +merge+ will be used. If +nil+, the =
value
>    is overwritten. Otherwise the stored value will be the result of =
calling
>    `merge` with the arguments +key+, +first_value+ and +other_value+
>    (see Hash#merge). In a similar way to `Enumerable#inject`, passing =
a symbol
>    for +merge+ is equivalent to passing
>    <tt>->(key, first, other){ first.send(merge, other) }</tt>
>=20
>       x =3D [[:foo, 10], [:bar, 30], [:foo, 32]]
>       x.associate{|e| e}                    # =3D> {:foo =3D> 32, :bar =
=3D> 30}
>       x.associate(->(k, a, b){a}){|e| e}    # =3D> {:foo =3D> 10, :bar =
=3D> 30}
>       x.associate(:+){|e| e}                # =3D> {:foo =3D> 42, :bar =
=3D> 30}
>       x.associate(:concat){|k, v| [k, [v]]} # =3D> {:foo =3D> [10, =
32],
> :bar =3D> [30]}
>=20
>=20
> A question that remains is: should there be special checks for cases
> where the result has varying length?
> E.g., what error should the following raise (or what should be the =
result):
>=20
>    [[:foo, :bar], [:foo, :bar, :baz]].associate{|x| x}  # =3D> ??
>=20
> Here is what a Ruby implementation could look like:
>=20
> module Enumerable
>   def associate(merge_func =3D nil)
>     return to_enum, __method__, merge_func unless block_given?
>=20
>     if merge_func.is_a? Symbol
>       sym =3D merge_func
>       merge_func =3D ->(k, v1, v2){v1.send(sym, v2)}
>     end
>=20
>     top_level_hash =3D {}
>     each do |*elems|
>       result =3D yield(*elems)
>       result =3D [result] unless result.is_a? Array
>       value =3D result.pop
>=20
>       if result.empty? # deduce key
>         key =3D elems.first
>         key =3D key.first if key.is_a?(Array)
>         initial_keys =3D []
>       else
>         key =3D result.pop
>         initial_keys =3D result
>       end
>=20
>       final_hash =3D initial_keys.inject(top_level_hash){|cur_h, k|
> cur_h[k] ||=3D {}}
>=20
>       if merge_func && final_hash.has_key?(key)
>         value =3D merge_func.call(key, final_hash[key], value)
>       end
>       final_hash[key] =3D value
>     end
>     top_level_hash
>   end
> end
>=20
>=20
> Thanks
> --
> Marc-Andr=E9
>=20