Here is my solution:

The result:

$ time ruby sample.rb 5_000_000 1_000_000_000 > big_sample

real    0m30.493s
user    0m29.435s
sys     0m0.681s
$ ll big_sample
-rw-r--r--  1 dba users 49443878 17. Jul 11:29 big_sample
$ wc -l big_sample
5000000 big_sample
$ uniq big_sample | wc -l
5000000
$ head big_sample
624
756
1293
1506
1607
1627
2406
2979
3217
3396
$ tail big_sample
999998726
999998795
999998805
999998901
999998957
999999083
999999114
999999193
999999353
999999514

on a Pentium M 1500MHz, 512MB RAM


The algorithm:

Nothing special. Just store the seen values in a hash, check if new values  
are already in the hash and when finished, return the sorted keys of the  
hash. This needs about 19s on my machine.


The (real) problem:

Writing the results to stdout. As it turned out, this is very slow if you  
just do x.each { |el| puts el }. After some experiments I figured out,  
that garbage collection is the problem.
Each puts el generates two new Strings (didn't really check that): el.to_s  
an "\n". If you do this 5000000 times the garbage collector is triggered  
very often. It gets faster if you do something like that:

nl = "\n"
x.each { |el| print el, nl }

But the "real solution" is GC.disable. That has a downside of course. The  
above run of sample.rb needs approx. 400MB of RAM. So, don't try this at  
home if you have less than 512MB ;-)


I also have a debug/benchmark feature, that prints the time for each  
phase, just do:

$ time ruby -d sample.rb 5_000_000 1_000_000_000 > big_sample
start at 3.79085540771484e-05
rand at 14.2052850723267
sort at 19.433424949646
print at 31.1665999889374

real    0m31.411s
user    0m30.318s
sys     0m0.757s


The code:

if $DEBUG
     def ptime(evt)
         $ptimeg ||= Time.now.to_f
         STDERR.puts "#{evt} at #{Time.now.to_f - $ptimeg}"
     end
else
     def ptime(evt)
         # noop
     end
end

# the actuall sampling, just store the seen values in a hash and return the
# sorted hash keys
def sample(cnt, lim)
     x = {}
     tmp = nil

     for i in 0...cnt
         while x.has_key?(tmp = rand(lim))
         end
         x[tmp] = true
     end
     ptime "rand"

     x = x.keys.sort
     ptime "sort"
     x
end

# this is the key to success, but needs lots of ram
GC.disable

ptime "start"

x = sample(cnt=ARGV[0].to_i, ARGV[1].to_i)

# creating the newline string only once saves 5s
nl = "\n"
i = 0
while i+10 <= cnt
     # this is saves about 1s
     print x[i], nl, x[i+1], nl, x[i+2], nl, x[i+3], nl, x[i+4],
     nl, x[i+5], nl, x[i+6], nl, x[i+7], nl, x[i+8], nl, x[i+9], nl
     i += 10
end
for j in i...cnt
     print x[j], nl
end
ptime "print"