On Mon, Jan 28, 2013 at 6:39 PM, jooma lavata <lists / ruby-forum.com> wrote:
> I'm learning Ruby and I'm reading some expression that I saw on the
> forum. I'm coming from Javascript. This is really hard for me. Please
> help explain to me in plain English. I understand that it's a Function
> that takes string and count words to return a Hash.
>
> def count_words(string)
>   res = Hash.new(0)
>   string.downcase.scan(/\w+/).map{|word| res[word] =
> string.downcase.scan(/\b#{word}\b/).size}
>   return res
> end

That's not a very idiomatic way, because the result of the map
function, which returns an array, is ignored. This signals that map is
not the correct method to use.  Now, with that said:

string.downcase #=> returns a new string with all the characters downcased
.scan(/\w+/) #=> return an array of strings with each match of the
regular expression. \w+ means: one or more word characters, so this
should return an array of words.
.map #=> returns a new array where each position is filled with the
result of invoking the block with each element of the array. Example:

[1,2,3].map {|x| "x is #{x}"} #=> ["x is 1", "x is 2", "x is 3"]

res[word] = string.downcase.scan(\b#{word}\b/).size

What this means is, take the string, downcase it again, scan it for
the current word surrounded by word boundaries (so, whole word), take
the size of that array and place it in the hash under the key for this
word.
This is extremely inefficient, since, first of all, for each word it's
downcasing the string again, and then scanning for each word through
the full string again (which you are already doing). So this seems to
be O(N^2), where a single pass through the string should suffice.
Also, the block-less form of scan and using map like that is creating
many intermediate objects that are not used.

I'd do something like:

res = Hash.new(0)
string.downcase.scan(/\w+/) {|word| res[word] += 1}
return res

This uses the block form of scan, which instead of building an array,
just yields each match to the block. Since we are not doing anything
with that array, this is more efficient. We take advantage of the
default value of hash, which is set to 0, to just increment the count
for each word.

Hope this helps,

Jesus.