On Tue, 6 Dec 2005, Mauricio [iso-8859-1] FernáÏdez wrote: > On Sun, Dec 04, 2005 at 10:21:02AM +0900, ara.t.howard / noaa.gov wrote: >> On Sun, 4 Dec 2005, Mauricio [iso-8859-1] FernáÏdez wrote: >>> sort_by{ rand } is actually biased, since sort_by will preserve the >>> relative order of the elements for which rand() returned the same value. > > I did some numbers and found the following probability estimates: > > array size P(#rand() collision) > 1000 5.54558e-11 > 10000 5.55056e-09 > 1000000 5.55096e-05 > 10000000 5.53574e-03 > > A collision implies that the associated pair of elements has got the same > ordering in the output array, instead of 50-50 chances of it being reversed. > > More info, including the Ruby code I used, available at > http://eigenclass.org/hiki.rb?sort_by+rand+is+biased > >> does >> >> %w( a b c d ).sort{ rand <=> rand } >> >> help? > > I'm not totally sure it's really unbiased, but at first sight it looks good. the big difference is that a value will have rand called multiple times for it - in otherwords an element compareds differently every time. this is bound to help shuffle more uniformly it seems: harp:~ > cat a.rb %w( a b c d ).sort{|a,b| ra = rand; rb = rand; p a => ra; p b => rb; ra <=> rb } harp:~ > ruby a.rb {"a"=>0.181439380218727} {"c"=>0.579403886629126} {"c"=>0.713513989939958} {"d"=>0.573687508540169} {"a"=>0.851534740906118} {"d"=>0.00156074151427565} {"b"=>0.803591007691647} {"a"=>0.494066110872563} {"b"=>0.834676789626288} {"c"=>0.792106134460754} so, note that "a" was sorted using a different value each time. can anyone who remembers stats and knows the sort algorithm weigh in? -a -- =============================================================================== | ara [dot] t [dot] howard [at] noaa [dot] gov | all happiness comes from the desire for others to be happy. all misery | comes from the desire for oneself to be happy. | -- bodhicaryavatara ===============================================================================