On 10/28/07, Mohit Sindhwani <mo_mail / onghu.com> wrote: > I so have to get the hang of inject, flatten and map. > > Cheers, > Mohit. > 10/28/2007 | 9:16 PM. Hi, They are definitely worth looking into - inject in particular is a powerful tool (Robert Klemme can make it do anything!). However, the following benchmark shows that a slight modification of your approach is actually pretty efficient. (The modification is to store the duplicates in a hash rather than an array so you can return the list of duplicates using Hash#keys). Regards, Sean # Mohit Sindhwani (with slight adjustment) def duplicates_1(array) seen = { } duplicates = { } array.each {|item| seen.key?(item) ? duplicates[item] = true : seen[item] = true} duplicates.keys end # Robert Klemme def duplicates_2(array) array.inject(Hash.new(0)) {|ha,e| ha[e]+=1;ha}.delete_if {|k,v| v==1}.keys end # from facets def duplicates_3(array) array.inject(Hash.new(0)){|h,v| h[v]+=1; h}.reject{|k,v| v==1}.keys end require 'benchmark' def do_benchmark(title, n, methods, *args, &block) puts '-' * 40 puts title puts '-' * 40 Benchmark.bm(methods.map{ |x| x.to_s.length}.max + 2) do |x| methods.each do |meth| x.report(meth.to_s) { n.times do send(meth, *args, &block) end } end end end # get some data (Ubuntu specific I guess - YMMV) array = File.read('/etc/dictionaries-common/words').split(/\n/) # test w/o dups do_benchmark('no duplicates', 10, [:duplicates_1, :duplicates_2, :duplicates_3], array) # create some duplicates array = array[0..999] * 100 do_benchmark('duplicates', 10, [:duplicates_1, :duplicates_2, :duplicates_3], array) __END__ $ ruby bm-duplicates.rb ---------------------------------------- no duplicates ---------------------------------------- user system total real duplicates_1 2.200000 0.010000 2.210000 ( 2.215057) duplicates_2 5.820000 0.000000 5.820000 ( 5.812414) duplicates_3 6.580000 0.010000 6.590000 ( 6.586708) ---------------------------------------- duplicates ---------------------------------------- user system total real duplicates_1 1.560000 0.000000 1.560000 ( 1.562587) duplicates_2 2.660000 0.000000 2.660000 ( 2.665301) duplicates_3 2.590000 0.000000 2.590000 ( 2.595189)