Issue #10552 has been updated by David Workman.


I like this idea, but I think it could be improved by allowing .frequencies to take a block and it will count the frequencies of the return value of the block, similar to .all?, .any? and .none?

This would allow the frequencies method to be useful not just on arrays of strings but also able to be used on more complex data structures without having to do a .map first to massage data into the desired format first.

----------------------------------------
Feature #10552: [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies
https://bugs.ruby-lang.org/issues/10552#change-50147

* Author: Brian Hempel
* Status: Open
* Priority: Normal
* Assignee: 
* Category: core
* Target version: 
----------------------------------------
Counting how many times a value appears in some collection has always been a bit clumsy in Ruby. While Ruby has enough constructs to do it in one line, it still requires knowing the folklore of the optimum solution as well as some acrobatic typing:

~~~ruby
%w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }
# => {"cat" => 1, "bird" => 2, "horse" => 1}
~~~

What if Ruby could count for us? This patch adds two methods to enumerables:

~~~ruby
%w[cat bird bird horse].frequencies
# => {"bird" => 2, "horse" => 1, "cat" => 1}

%w[cat bird bird horse].relative_frequencies
# => {"bird" => 0.5, "horse" => 0.25, "cat" => 0.25}
~~~

To make programmers happier, the returned hash has the most common values first. This is nice because, for example, finding the most common element of a collection becomes trivial:

~~~ruby
most_common, count = %w[cat bird bird horse].frequencies.first
~~~

Whereas the best you can do with vanilla Ruby is:

~~~ruby
most_common, count = %w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }.max_by(&:last)

# or...

most_common, count = %w[cat bird bird horse].group_by(&:to_s).map { |word, arr| [word, arr.size] }.max_by(&:last)
~~~

While I don't like the long method names, "frequencies" and "relative frequencies" are the terms used in basic statistics. http://en.wikipedia.org/wiki/Frequency_%28statistics%29


---Files--------------------------------
add_enum_frequencies.patch (5.81 KB)


-- 
https://bugs.ruby-lang.org/