Jacob Fugal <lukfugl / gmail.com> writes: > On 7/16/05, Olaf Klischat <klischat / cs.tu-berlin.de> wrote: >> > ezra:~/Sites ez$ head big_sample.txt >> > 168 >> > 285 >> > 566 >> > 604 >> > 912 >> > 1183 >> > 1335 >> > 1473 >> > 1728 >> > 1919 >> > ezra:~/Sites ez$ tail big_sample.txt >> > 999998155 >> > 999998313 >> > 999998484 >> > 999998680 >> > 999998825 >> > 999999151 >> > 999999330 >> > 999999465 >> > 999999621 >> > 999999877 >> > ezra:~/Sites ez$ >> >> Umm... I'm not sure, but that looks a bit too "equidistant" to be >> truly random, doesn't it? >> >> The sample being truly random means that the sample should be a truly >> "drawing without putting back" (e.g. lottery) sample, so each possible >> sample occurs with equal probability. So a sample like >> >> 0 >> 1 >> 2 >> .. >> .. >> .. >> 4999999 >> >> should occur with the same probability as any other more "likely" one. > > See my other posts in this thread about the actual probabilities. In > short, since it's fully random, each possible sampling is as likely as > any other possible sampling, but the number of samplings including at > least ten numbers in the 99999xxxx range and at least ten numbers in > the (00000)xxxx range is *much* higher than the number of samplings > without numbers in those ranges. So the probability of getting a > sampling that looks evenly spread out is much more likely than getting > a sampling that's clustered. Of course. I didn't mean to say that a really "clustered" sample like 0...499999999 (or any other specific sample) has any significant probability. But if you look at Ezra's output: 0 > 168 200 > 285 400 > 566 600 > 604 800 > 912 1000 > 1183 1200 > 1335 1400 > 1473 1600 > 1728 1800 > 1919 > > > 999998155 999998200 > 999998313 999998400 > 999998484 999998600 > 999998680 999998800 > 999998825 999999000 > 999999151 999999200 > 999999330 999999400 > 999999465 999999600 > 999999621 999999800 > 999999877 1000000000 See? One number per 200-numbers interval. Every time[1]. This hints at a wrong implementation. I caught this because I had the same idea first :) [1] Unless I'm mistaken, in the 5e6-from-1e9 sampling, the probability that a sampling contains exactly one number from a given 200-numbers interval is 200.0*(1/200)*(199.0/200)**199 = 0.3688. The probability that this happens for 20 such 200-numbers intervals is 0.3688**20 = 2.1e-09.