On Mon, Jul 18, 2005 at 02:48:34AM +0900, Bill Kelly wrote:
> My method of "finding the next slot" in a loop on a collision
> may be overly cheesy - I don't know.  I did consider just 
> asking for a new sample until finding one that didn't collide
> (I like Joost's "dumping the values in and checking the
> length" approach.)  However I've tended to avoid that approach,
> since when the number of samples desired approaches the 
> upper_bound, e.g. 5_000_000 5_000_000 worst case, I've been
> burned in the past by such algorithms working very hard to find
> the last few empty slots.  However, it clearly wasn't a problem
> for the 5_000_000 1_000_000_000 parameters for this quiz - I
> just wanted to explain why I did it differently.

I thought about that too, but I chose to ignore the problem because:

a) I assumed that if you want random samples from a set, you generally
want much less samples than the size of the total set.

b) As soon as samples.size > total.size / 2 it's trivial to turn the
algorithm on its head (i.e. randomly select items that you'll skip,
then loop through the set and print all items that aren't selected).
I did a few runs that suggest that sample.size == total.size / 2 is
relatively slow on my implementation, but not impractal. Your milage
may vary if you use a crappy random number generator, ofcourse :-)

Joost.