Hi,

On a 2.4 GHz Celeron, I got:

real    0m55.169s
user    0m53.860s
sys     0m1.210s

...after adding GC.disable, I get:

real    0m31.565s
user    0m30.500s
sys     0m1.060s


This was my final solution (sans GC.disable, since I'd
forgotten about that 'till reading Dominik's solution.)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#!/usr/bin/env ruby

num_samples = ARGV.shift.to_i
upper_bound = ARGV.shift.to_i

uniq = {}
data = []

warn "calc..."

num_samples.times do
  r = rand(upper_bound)
  if uniq[r]
    num_samples.times do
      r = 0 if (r += 1) >= upper_bound
      break if uniq[r].nil?
    end
  end
  data << uniq[r] = r
end

warn "sort..."
data.sort!

warn "stringify..."
data.map! {|n| n.to_s }

warn "join..."
res = data.join("\n")

warn "output..."
puts res

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I append the samples to the data array as they are produced,
because it seemed to be faster than doing Hash#keys afterward.
(It may have not been very significant... I don't remember.)

I fiddled with how to output the data for awhile... puts of
the entire string was lightning fast, and joining an array
of strings is pretty fast (considerably faster than joining
an array of fixnums, surprisingly.)  Mapping the array of
fixnums to strings explicitly prior to calling #join, seemed
faster than letting #join do the conversion (though this 
seems strange/counterintuitive to me.)

My method of "finding the next slot" in a loop on a collision
may be overly cheesy - I don't know.  I did consider just 
asking for a new sample until finding one that didn't collide
(I like Joost's "dumping the values in and checking the
length" approach.)  However I've tended to avoid that approach,
since when the number of samples desired approaches the 
upper_bound, e.g. 5_000_000 5_000_000 worst case, I've been
burned in the past by such algorithms working very hard to find
the last few empty slots.  However, it clearly wasn't a problem
for the 5_000_000 1_000_000_000 parameters for this quiz - I
just wanted to explain why I did it differently.

Here's the output -

$ time ruby sample-c.rb 5_000_000 1_000_000_000 > big_sample-c.txt
calc...
sort...
stringify...
join...
output...

real    0m55.169s
user    0m53.860s
sys     0m1.210s

$ wc big_sample-c.txt
 5000000  5000000 49445562 big_sample-c.txt

$ head big_sample-c.txt
13
41
870
1225
1281
1434
1649
1921
1991
3047

$ tail big_sample-c.txt
999997887
999998139
999998335
999998632
999998893
999998947
999999169
999999219
999999271
999999587


Thanks for the fun quiz!  It's the first one I've completed.


Regards,

Bill