Issue #6261 has been updated by matz (Yukihiro Matsumoto).

Status changed from Open to Rejected

use Enumerable#lazy.

Matz.

----------------------------------------
Feature #6261: Enumerable#emap and Enumerable#egrep
https://bugs.ruby-lang.org/issues/6261#change-25674

Author: yimutang (Joey Zhou)
Status: Rejected
Priority: Normal
Assignee: 
Category: 
Target version: 


I was inspired by Ruby 1.9.x`s Enumerable#chunk and #slice_before, which both take a block and return an enumerator. I wish to introduce two new method into the Enumerable core, which can be implemented in Ruby like this:



module Enumerable
  
  def emap # return an enumerator
    raise ArgumentError, 'no block given' unless block_given?
    
    Enumerator.new do |yielder|
      self.each do |elem|
        mapped = yield elem
        yielder << mapped
      end
    end
  end
  
  
  def egrep
    raise ArgumentError, 'no block given' unless block_given?
    
    Enumerator.new do |yielder|
      self.each do |elem|
        allowed = yield elem
        yielder << elem if allowed
      end
    end
  end
  
end



#emap + #to_a is just like #map / #collect, #egrep + #to_a is just like #select. Why I think it's necessary to introduce those methods? Because #collect and #select sometimes are not effecient. Here's an weird example:



lines = File.foreach('a_very_large_file')
            .egrep {|line| line.length < 10 }
            .emap {|line| line.chomp!; line }
            .each_slice(3)
            .emap {|lines| lines.join(';').downcase }
            .take_while {|line| line.length > 20 }



The above code means: from 'a_very_large_file' take each line, let go whose length < 10, chomp each allowed line, take 3 of them as a group and join them, at last, stop when the length of joined line has length less than 20.

If you replace #egrep with #select, #emap with #collect, you must iterate the whole lines of 'a_very_large_file' and create a temporary array, 3 times! It is not efficient in this situation, because the #take_while means 'I do not want to check all lines'.

If you want to omit the #select and #collect, just do it like:



File.foreach('a_very_large_file') do |line|
  # blah blah to achieve the same goal
end



I'm afraid it's hard to make the code clear at a glance.

So you may see #egrep and #emap are very useful.

Another example, I want to make a class FreqDist, which records the frequency distribution of a population of samples.



class FreqDist
  
  def initialize(samples)
    @sample_dict = Hash.new(0)
    samples.each {|sample| @sample_dict[sample] += 1 }
  end
  
end



I want to use FreqDist to store the frequency distribution of a list of words, but there is case problem, 'When' and 'when' should not be regard as two sample. I can do it like this:

fd = FreqDist.new(words.emap {|w| w.downcase })

use an enumerator instead of an array as argument, iterate once, no temporary array.

Well, in my opinion, such #emap and #egrep are very powerful. Although I can implement them in Ruby and put them in a custom gem, I think it's better to introduce them into the core Enumerable module.

Please consider the suggestion. Thank you!


-- 
http://bugs.ruby-lang.org/