Dominik Bathon wrote: > Here is my solution: > > The result: > > $ time ruby sample.rb 5_000_000 1_000_000_000 > big_sample > > real 0m30.493s > user 0m29.435s > sys 0m0.681s > > on a Pentium M 1500MHz, 512MB RAM $ time ruby Sampling.rb 5_000_000 1_000_000_000 > big_sample real 5m20.404s user 5m18.600s sys 0m1.480s on a 1.2 GHz Duron with 512 MB RAM. The odd thing is, that I could not see where to optimise to get near the below 1 minute border. In the end I tried 5 approaches (one being inspired by Dominiks idea with the Hash, one by Joost). I wrote a simple class to hold the different methods I used, where @count is the number of members to be reported, @limit is the upper bound of the range. @numbers is an Array and @numberHash a - guess - Hash. The obvious method (at least for me) was to insert a random number into an Array, checking whether the Array did not include the number and redo-ing if it was already included. def generate_with_array_include @numbers.clear @count.times do newNumber = rand(@limit) if @numbers.include?(newNumber) redo else @numbers << newNumber end end end A different way to do the same is def generate_with_uniq_first @numbers.clear @count.times do @numbers << rand(@limit) end @numbers.uniq! while @numbers.length < @count @numbers << rand(@limit) @numbers.uniq! end end The idea behind this is, that with a big difference between @count and @limit there is a slight chance, that no duplicate numbers will be returned by Kernel#rand. That needs to be checked - so call Array#uniq! to delete the duplicates. After that simply loop until the Array is full (means, it has the required number of elements), calling uniq! in each iteration. That performed better than 'generate_with_array_include' (30 times faster). def generate_with_uniq_second @numbers.clear while @numbers.length < @count (@count - @numbers.length).times do @numbers << rand(@limit) end @numbers.uniq! end end This is even faster (takes only half the time as 'generate_with_uniq_first') and looks better to me. The 4th method uses the Hash approach Dominik chose. def generate_with_hash_has_key @numberHash.clear @count.times do newNumber = rand(@limit) if @numberHash.has_key?(newNumber) redo else @numberHash[newNumber] = nil end end @numbers = @numberHash.keys end This is an adaptation to using hash instead of Array from my initial algorithm. This method is slightly faster than my second try with uniq! - with @count being 5_000 and @limit 1_000_000. With bigger numbers the difference grows. The last line stores the keys in the @numbers-Array, so I didn't need to change my output method - which looks like that: def to_s @numbers.sort.join("\n") end The 5th method is inspired by Joosts Hash#length idea. After I read it I thought "This is so simple - why did you spend half the day thinking about how to reach their 'below 1 minute' timings?". def generate_with_hash_length while @numberHash.length < @count @numberHash[rand(@limit)] = nil end @numbers = @numberHash.keys end And this is the winner (from what I tried). Here are some timings (without output) from a smaller sample (look at 'generate_with_array_include', I really didn't want to use this with bigger numbers) ruby -v Sampling.rb 50_000 10_000_000 ruby 1.8.2 (2005-04-11) [i386-linux] user system total real generate_with_uniq_first 17.790000 0.020000 17.810000 ( 17.806102) generate_with_uniq_second 0.950000 0.010000 0.960000 ( 0.956298) generate_with_array_include 243.300000 0.090000 243.390000 (243.969158) generate_with_hash_has_key 0.310000 0.000000 0.310000 ( 0.308837) generate_with_hash_length 0.020000 0.000000 0.020000 ( 0.019657) And here some timings (without output as well) from the top3 methods (using the original parameters) ruby -v Sampling.rb 5_000_000 1_000_000_000 ruby 1.8.2 (2005-04-11) [i386-linux] user system total real generate_with_uniq_second 95.240000 1.070000 96.310000 ( 96.511207) generate_with_hash_has_key 63.570000 0.970000 64.540000 ( 64.641186) generate_with_hash_length 5.720000 0.000000 5.720000 ( 5.718732) So, even my best method using Array and uniq! turned out to be not suited. Well, at the end I discovered the bmbm method of Benchmark. It gives different results for the second run: ruby -v Sampling.rb 5_000_000 1_000_000_000 ruby 1.8.2 (2005-04-11) [i386-linux] Rehearsal -------------------------------------------------------------- generate_with_uniq_second 95.640000 0.850000 96.490000 ( 96.563648) generate_with_hash_has_key 63.280000 1.270000 64.550000 ( 64.687427) generate_with_hash_length 5.720000 0.000000 5.720000 ( 5.718150) --------------------------------------------------- total: 166.760000sec user system total real generate_with_uniq_second 99.790000 0.740000 100.530000 (100.740020) generate_with_hash_has_key 45.050000 0.830000 45.880000 ( 45.987040) generate_with_hash_length 3.130000 0.000000 3.130000 ( 3.132459) I would have never thought about using a Hash in this quiz. I really do not need the value of any stored key. It is an Array-problem. List of numbers. Sorted. (unique) Why use a Hash? Some of you did use it. Do you mind telling me why? Anyway, it was a great quiz. Final timings, now using 'generate_with_hash_length': time ruby -d Sampling.rb 5_000_000 1_000_000_000 > big_sample_new real 1m17.537s user 0m58.200s sys 0m1.180s Stefan Mahlitz