Robert Klemme wrote:
> On 30.05.2007 21:54, SonOfLilit wrote:
>> Following is ruby-prof output. First, a few words:
>>
>> 4.75 against 1.56 isn't too good, but it isn't ten times slower. I
>> think it can be excused (and maybe the code can be optimized a bit) in
>> cases where you'd process huge lists with ruby of all languages.
>>
>> This is the code used to profile:
>>
>> require 'rubygems'
>> require 'ruby-prof'
>>
>> module Enumerable
>> def serially(&b)
>>   a = []
>>   self.each{|x| t = [x].instance_eval(&b); a << t[0] unless t.empty?}
>>   a
>> end
>> end
>>
>> # Profile the code
>> result = RubyProf.profile{
>> a = (1..50000)
>> b = a.serially{select{|x| x>280}.collect{|x|"0x" <<
>> x.to_s}.select{|x| x != "0x15"}}
>> }
>>
>> # Print a graph profile to text
>> printer = RubyProf::GraphPrinter.new(result)
>> printer.print(STDOUT, 0)
>>
>> result = RubyProf.profile{
>> a = (1..50000)
>> b = a.select{|x| x>280}.collect{|x|"0x" << x.to_s}.select{|x| x != 
>> "0x15"}
>> }
>>
>> # Print a graph profile to text
>> printer = RubyProf::GraphPrinter.new(result)
>> printer.print(STDOUT, 0)
> 
> You are changing subject in between:  Originally you started out with 
> memory consumption being the issue.  Now you talk about speed.  Your 
> solution is neither fast nor easy on the memory.  For speed see the 
> benchmark below.  It's not easy on the memory because you still create a 
> *ton* of temporary one element arrays (four per iteration if I'm not 
> mistaken) - this avoids the single large copy but allocating and GC'ing 
> all these temporaries imposes a significant overhead.  The obvious and 
> simple solution here is to do it all in *one* iteration.  So why invent 
> something new if there is an easy and efficient solution available?
> 
> Kind regards
> 
>     robert
> 
> 
> 10:38:35 [Temp]: ./ser.rb
> Rehearsal --------------------------------------------------
> classic          3.359000   0.000000   3.359000 (  3.338000)
> serially        13.719000   0.016000  13.735000 ( 13.747000)
> serially!       13.328000   0.000000  13.328000 ( 13.361000)
> inject           4.297000   0.000000   4.297000 (  4.339000)
> each             3.297000   0.000000   3.297000 (  3.273000)
> ---------------------------------------- total: 38.016000sec
> 
>                      user     system      total        real
> classic          3.203000   0.000000   3.203000 (  3.265000)
> serially        12.891000   0.000000  12.891000 ( 13.233000)
> serially!       12.438000   0.000000  12.438000 ( 12.617000)
> inject           3.937000   0.000000   3.937000 (  3.985000)
> each             2.937000   0.000000   2.937000 (  3.008000)
> 
> 

Now who's changing the subject? Of course if you fold all your desired 
operations into a single each block it will be faster. The question
is can you find a pattern that differs marginally from the classic
pattern but is faster and more memory efficient.

As I pointed out in a previous post C++ expression templates are
used to effect the optimisation that is being talked about here
in C++. Perhapps an on the fly code generation trick to generate
this optimal each block is worth it if the data size is large
enough.

B