Stephen Kellett wrote:
> In message <1116294726.4747.30.camel / localhost.localdomain>, Zed A. Shaw
> <zedshaw / zedshaw.com> writes
> 
>> The first thing is that there's not statistical basis for "1000 times".
> 
> There is. The error is smaller. If you don't believe me you need to
> examine why pollsters always ask at least 1000 potential voters their
> opinion. The error rate is +/- 3% with a sample size of approx 1000
> voters. Ask 10 people and predict the election result and your error
> will be much greater than 3%. The pollsters are in it to make money
> predicting outcomes. If they could get away with 5 or 10 samples, they
> would. It would be more profitable. They don't do it that way.

True (mostly), but irrelevant. Those statistics apply to problems of
estimating proportions, but this isn't one.

Characterizing performance of systems like this can expressed as a
simple linear regression problem:

   t = a + bx + e

where

   t = runtime
   a = fixed overhead (startup, teardown, etc.)
   b = runtime per 'size' unit
   x = size of request or returned data
   e = random error

Choose N values of x and observe their corresponding t values. Estimate
a and b using standard regression techniques.

The "goodness" (i.e., the variance) of the estimates of a and b depends
on the variance of e and the value of N. If var(e) is small, you can get
good estimates of a and b with small N. In particular, if var(e) = 0,
you can get perfect estimates of a and b with N = 2.

If I needed 1000 samples to get good estimates of performance of an
information system, I'd stop trying to overcome that with large numbers
and figure out why randomness plays such a large role in the performance
of my system.

Steve