Sean O'Halpin wrote:
> On 10/28/07, Mohit Sindhwani <mo_mail / onghu.com> wrote:
>   
>> I so have to get the hang of inject, flatten and map.
>>
>> Cheers,
>> Mohit.
>> 10/28/2007 | 9:16 PM.
>>     
>
> Hi,
>
> They are definitely worth looking into - inject in particular is a
> powerful tool (Robert Klemme can make it do anything!). However, the
> following benchmark shows that a slight modification of your approach
> is actually pretty efficient. (The modification is to store the
> duplicates in a hash rather than an array so you can return the list
> of duplicates using Hash#keys).
>
> Regards,
> Sean
>
> # Mohit Sindhwani (with slight adjustment)
> def duplicates_1(array)
>   seen = { }
>   duplicates = { }
>   array.each {|item| seen.key?(item) ? duplicates[item] = true :
> seen[item] = true}
>   duplicates.keys
> end
>
> # Robert Klemme
> def duplicates_2(array)
>   array.inject(Hash.new(0)) {|ha,e| ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
> end
>
> # from facets
> def duplicates_3(array)
>   array.inject(Hash.new(0)){|h,v| h[v]+=1; h}.reject{|k,v| v==1}.keys
> end
>
> require 'benchmark'
>
> def do_benchmark(title, n, methods, *args, &block)
>   puts '-' * 40
>   puts title
>   puts '-' * 40
>   Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x|
>     methods.each do |meth|
>       x.report(meth.to_s) { n.times do send(meth, *args, &block) end }
>     end
>   end
> end
>
> # get some data (Ubuntu specific I guess - YMMV)
> array = File.read('/etc/dictionaries-common/words').split(/\n/)
>
> # test w/o dups
> do_benchmark('no duplicates', 10, [:duplicates_1, :duplicates_2,
> :duplicates_3], array)
>
> # create some duplicates
> array = array[0..999] * 100
> do_benchmark('duplicates', 10, [:duplicates_1, :duplicates_2,
> :duplicates_3], array)
>
> __END__
> $ ruby bm-duplicates.rb
> ----------------------------------------
> no duplicates
> ----------------------------------------
>                     user     system      total        real
> duplicates_1    2.200000   0.010000   2.210000 (  2.215057)
> duplicates_2    5.820000   0.000000   5.820000 (  5.812414)
> duplicates_3    6.580000   0.010000   6.590000 (  6.586708)
> ----------------------------------------
> duplicates
> ----------------------------------------
>                     user     system      total        real
> duplicates_1    1.560000   0.000000   1.560000 (  1.562587)
> duplicates_2    2.660000   0.000000   2.660000 (  2.665301)
> duplicates_3    2.590000   0.000000   2.590000 (  2.595189)
>
>
>
>   

Thanks Sean!  Makes me feel quite nice about it. 

So, hashes are faster than arrays?

Cheers,
Mohit.
10/29/2007 | 2:13 AM.