On Jun 19, 2006, at 9:08, Kroeger, Simon (ext) wrote:
> I think it's a little bit to simple to be put in the standard lib.
>
> a.sort_by{rand}.each {|e| print e , ' '}

But it turns out to not be that simple.  This can give a biased sort  
depending on the underlying sort algorithm, and it's not as efficient  
as it could be, as Alex observes:

On Jun 19, 2006, at 10:52, Alex Young wrote:
> For large collections, it'd be much nicer not to have to either  
> duplicate or sort - isn't this something that quasirandom sequences  
> are good for?  Is there some GSL-guru out there who can comment?

A quasi-random sequence is a theoretically better flavour of random  
than a pseudo-random sequence in a lot of cases, but there's no  
direct application to this problem (apart from being used in place of  
Kernel#rand).  It'd be overkill for all but the most specialised  
uses, I'd say.

On Jun 19, 2006, at 11:29, transfire / gmail.com wrote:
> Perhaps...
>>
>>   class  Array
>>      def random_each
>>           ((0...size).to_a.sort_by{rand}).each { |i|
>>             yield self[i]
>>           }
>>      end
>>   end
>
> On second thought, that still sorts an array of the same size. Hmm...

What we're looking for is a shuffle; a random selection of elements  
from an array might not include all the elements from the source,  
whereas a shuffle is a permutation of the source.

Like a lot of simple-sounding problems, this one's already been  
solved, see:

http://www.nist.gov/dads/HTML/perfectShuffle.html

The linked Haskell implementation also gives a very thorough  
discussion, including a demonstration of how a stable sorting  
algorithm will yield biased results.  The imperative solution given  
in that discussion is identical to the Fisher-Yates shuffle on the  
NIST site, which is what I'll use here:
   - for i: 0...n, exchange each element e_i with another element  
e_i...e_n.

Which is the best possible case of O(n), compared to O(n lg n) when  
using #sort_by{rand}.  This is proven to be unbiased (see the  
reference to Knuth given on the NIST site).

Intuitively, the best place for this seems like Enumerable, since  
anything that can be sorted can be shuffled.  Is this the sort of  
thing that needs an RCR?  Anyway, I doubt I'd use it myself, but I  
can see the following arguments for its inclusion:
  - It's easy to do wrong:
    - inefficiently and possibly biased using #sort_by.
    - biased under any other sort of element exchange:
      e.g., exchanging e_i with *any* element in the
      array instead of just the unshuffled elements
      leads to a biased shuffle.
  - It seems fairly generic.
  - People migrating from other languages may expect
    it (e.g., PHP, Java).

On the other hand, people who REALLY care about unbiased shuffles  
will already know how to do them.

Quick Ruby implementation for Array: In-place shuffling is the  
simplest case.  We can use it to create a more Ruby-esque shuffle by  
using an approach similar to trans' one given above: shuffle an array  
of indices and use that as a mapping between the original and  
shuffled arrays.

class Array
   def shuffle!
     self.each_index { |i|
       r = rand(size - i) + i
       tmp = self[i]
       self[i] = self[r]
       self[r] = tmp
     }
   end

   # this would work if dup produced a deep copy.
   def dup_shuffle
     self.dup.shuffle!
   end

   # use an in-place shuffle of an array of indices
   def shuffle
     res = []
     indices = (0...size).to_a.shuffle!
     indices.each_with_index { |x, i|
       block_given? ? yield(self[x]) : res[i] = self[x]
     }
     block_given? ? self : res
   end
end

matthew smillie.