2010/12/27 Marc-Andre Lafortune <ruby-core-mailing-list / marc-andre.ca>:
>
> I have an alternate proposition of a modified `categorize` which I
> believe addresses the problems I see with it:
> 1) Complex interface (as was mentioned by others)

I think your 'associate' is not so simple.
Some part is more simple than 'categorize'.
Some part is more complex than 'categorize'.

> 2) By default, `categorize` creates a "grouped hash" (like group_by),
> while there is not (yet) a way to create a normal hash. I would
> estimate that most hash created are not of the form {key => [some
> list]} and I would rather have a nicer way to construct the other
> hashes too. This would make for a nice replacement for most
> "inject({}){...}" and "Hash[enum.map{...}]".

Possible.

There are 2 reasons for that I proposed a method for "grouped hash" at first.
* It doesn't lose information at key conflict.
* I (and matz) don't have a good (enough) method name for "normal hash".

I'm not sure that matz will satisfy the name 'associate'.

> My alternate suggestion is a simple method that uses a block to build
> the key-value pairs and an optional Proc/lambda/symbol to handle key
> conflicts (with the same arguments as the block of `Hash#merge`). I
> would name this simply `associate`, but other names could do well too
> (e.g. `mash` or `graph` or even `to_h`).

'categorize' and 'associate' differs as follows.

* 'associate' creates normal hash.

  This is intentional difference.

* 'associate' doesn't create nested hash.

  'associate' is simpler here.

  I think 'associate' can be extended naturally that the method creates
  nested hash when the block returns an array with 3 or more elements.

  For the example in [ruby-talk:372481],
  Your 'associate' (without above extention) solves only the nest level
  but the 'categorize' solves any nest level.

  >    dest == orig.categorize(:op=>lambda {|x,y| y }) {|e| e }
  >    dest == orig.associate(:merge){|a, b, c| [a, {b=>c}]}

* 'associate' assumes {|v| v } if the block is not given.

  This simplify some usages.
  However this forbids Ruby 1.9 style enumerator creation
  which returns an enumerator when block is not given.
  This means we cannot write enum.associate.with_index {|v, i| ... }.

* 'associate' treates non-array block value.

  This is more complex than 'categorize'.

  I feel it is bit distorted specification.
  Especially "(first)" in "Otherwise the value is the result of the block
  and corresponding key is the (first) yielded item."

  'categorize' can adopt it but I don't want.

* 'associate' doesn't use hash argument.

  This may be good idea.

  'categorize' needs hash argument mainly because
  it must distinguish the merge function needs key or not.
  (proc specified by :update needs key.
  proc specified by :op don't need key.)

  'associate' classify them by symbol or proc.
  It can be applied for 'categorize'.

  However symbol and symbol.to_proc will be different, though.

* 'associate' doesn't have a way to specify the seed.

  This is simpler specification than 'categorize'
  but this makes some usages more complex.

  'associate' can be extended to take a second optional argument for seed.

  In your 'associate' examples for [ruby-talk:347364] and
  [ruby-talk:327908], array and string concatenation is O(n**2).
  (n is (maximum) number of elements in a category.)

  >    p dest == orig.associate(:+){|h, v| [h, [v]]}
  a = [v1]
  a = a + [v2]
  a = a + [v3]
  ...

  >    orig.associate(->(k, a, b){"#{a} #{b}"})
  s = v1
  s = "#{s} #{v2}"
  s = "#{s} #{v3}"
  ...

  To avoid this inefficiency, destructive concatenation method
  can be used:

  >    # or if duping the string is required (??):
  >    orig.associate(->(k, a, b){a << " " << b}){|x, y| [x, y.dup]}

  However the dup is required to not modify the receiver, orig.

  I think seed is a simple way to avoid O(n**2) and receiver modification
  without extra objects, as follows.

  >     orig.categorize(:seed=>nil, :op=>lambda {|x,y| !x ? y.dup : (x <<
  >    " " << y) }) {|e| e }

> It could of course be argued that both `associate` and `categorize`
> should be added. That may very be;

Yes.

Actually I want one more method for counting.
(I want 3 methods: grouped hash, normal hash, count hash)

> I just feel that `associate` should
> be added in priority over `categorize`.

matz felt similar.  [ruby-dev:42643]

But we couldn't find a good name for normal hash creation method.
So the discussion is pending.

>    * [ruby-talk:344723]
>
>    a=[1,2,5,13]
>    b=[1,1,2,2,2,5,13,13,13]
>    # to
>    dest =
>     [[0, 0], [0, 1], [1, 2], [1, 3], [1, 4], [2, 5], [3, 6], [3, 7], [3, 8]]
>
>    # This can be implemented as:
>     h = a.categorize.with_index {|e, i| [e,i] }
>     b.map.with_index {|e, j| h[e] ? h[e].map {|i| [i,j] } : [] }.flatten(1)
>    # or
>     h = a.each_with_index.associate
>     b.map.with_index{|e, i| [h[e], i] }

Your solution depends on 'a' has no duplicated elements.
Since [ruby-talk:344723] asks about INNER JOINING,
I think 'a' may have duplicated elements.

  a=[1,1]
  b=[1,1]
  # to
  dest = [[0, 0], [1, 0], [0, 1], [1, 1]]
  h = a.categorize.with_index {|e, i| [e,i] }
  p dest == b.map.with_index {|e, j| h[e] ? h[e].map {|i| [i,j] } : []
}.flatten(1)
  #=> true
  h = a.each_with_index.associate
  p dest == b.map.with_index{|e, i| [h[e], i] }
  #=> false
-- 
Tanaka Akira