Issue #10552 has been updated by Martin Drst.


frequencies is essentially a group_by with the values mapped with size/count.

So assuming something like issue #9970 or issue #7793 gets accepted, it could simply be written as
%w[cat bird bird horse].group_by {|x| x}.map_values {|v| v.count }
or, if we get an identity method (*), as:
%w[cat bird bird horse].group_by(&:identity).map_values &:count

While this may not be very short, it's a concise description of what actually happens. I think it would be better for Ruby to improve how such generaltransformations can be written, rather than add more and more specialized methods methods such as (relative_)frequency. Such methods better would go into a statistics package (see 10228; would be good to have, too, of course.)

(*) I thought we had an issue for this, but couldn't find it.


----------------------------------------
Feature #10552: [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
https://bugs.ruby-lang.org/issues/10552#change-50155

* Author: Brian Hempel
* Status: Open
* Priority: Normal
* Assignee: 
* Category: core
* Target version: 
----------------------------------------
Counting how many times a value appears in some collection has always been a bit clumsy in Ruby. While Ruby has enough constructs to do it in one line, it still requires knowing the folklore of the optimum solution as well assome acrobatic typing:

~~~ruby
%w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }
# => {"cat" => 1, "bird" => 2, "horse" => 1}
~~~

What if Ruby could count for us? This patch adds two methods to enumerables:

~~~ruby
%w[cat bird bird horse].frequencies
# => {"bird" => 2, "horse" => 1, "cat" => 1}

%w[cat bird bird horse].relative_frequencies
# => {"bird" => 0.5, "horse" => 0.25, "cat" => 0.25}
~~~

To make programmers happier, the returned hash has the most common values first. This is nice because, for example, finding the most common element ofa collection becomes trivial:

~~~ruby
most_common, count = %w[cat bird bird horse].frequencies.first
~~~

Whereas the best you can do with vanilla Ruby is:

~~~ruby
most_common, count = %w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }.max_by(&:last)

# or...

most_common, count = %w[cat bird bird horse].group_by(&:to_s).map { |word, arr| [word, arr.size] }.max_by(&:last)
~~~

While I don't like the long method names, "frequencies" and "relative frequencies" are the terms used in basic statistics. http://en.wikipedia.org/wiki/Frequency_%28statistics%29


---Files--------------------------------
add_enum_frequencies.patch (5.81 KB)


-- 
https://bugs.ruby-lang.org/