On 10/28/07, Mohit Sindhwani <mo_mail / onghu.com> wrote:
> I so have to get the hang of inject, flatten and map.
>
> Cheers,
> Mohit.
> 10/28/2007 | 9:16 PM.

Hi,

They are definitely worth looking into - inject in particular is a
powerful tool (Robert Klemme can make it do anything!). However, the
following benchmark shows that a slight modification of your approach
is actually pretty efficient. (The modification is to store the
duplicates in a hash rather than an array so you can return the list
of duplicates using Hash#keys).

Regards,
Sean

# Mohit Sindhwani (with slight adjustment)
def duplicates_1(array)
  seen = { }
  duplicates = { }
  array.each {|item| seen.key?(item) ? duplicates[item] = true :
seen[item] = true}
  duplicates.keys
end

# Robert Klemme
def duplicates_2(array)
  array.inject(Hash.new(0)) {|ha,e| ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys
end

# from facets
def duplicates_3(array)
  array.inject(Hash.new(0)){|h,v| h[v]+=1; h}.reject{|k,v| v==1}.keys
end

require 'benchmark'

def do_benchmark(title, n, methods, *args, &block)
  puts '-' * 40
  puts title
  puts '-' * 40
  Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x|
    methods.each do |meth|
      x.report(meth.to_s) { n.times do send(meth, *args, &block) end }
    end
  end
end

# get some data (Ubuntu specific I guess - YMMV)
array = File.read('/etc/dictionaries-common/words').split(/\n/)

# test w/o dups
do_benchmark('no duplicates', 10, [:duplicates_1, :duplicates_2,
:duplicates_3], array)

# create some duplicates
array = array[0..999] * 100
do_benchmark('duplicates', 10, [:duplicates_1, :duplicates_2,
:duplicates_3], array)

__END__
$ ruby bm-duplicates.rb
----------------------------------------
no duplicates
----------------------------------------
                    user     system      total        real
duplicates_1    2.200000   0.010000   2.210000 (  2.215057)
duplicates_2    5.820000   0.000000   5.820000 (  5.812414)
duplicates_3    6.580000   0.010000   6.590000 (  6.586708)
----------------------------------------
duplicates
----------------------------------------
                    user     system      total        real
duplicates_1    1.560000   0.000000   1.560000 (  1.562587)
duplicates_2    2.660000   0.000000   2.660000 (  2.665301)
duplicates_3    2.590000   0.000000   2.590000 (  2.595189)