On Sun, Mar 30, 2003 at 04:07:29AM +0900, Phil Tomson wrote:
> 
> Yes, the lighter-weight solution definately is faster, but how it works is 
> not obvious to me either.

Explanation (includes ASCII pictures :-)


distribution function 
=====================

We define the distribution function (d.f. or dist. func.) of a random
variable as

  F(x) = P( X <= x )

that is the probability of the variable X being less than x; therefore
 lim_{x->-inf}{F(x)} = 0  
 lim_{x->+inf}{F(x)} = 1

Say that we want to generate values like val1, val2, ..., valn. It is
possible to map integers to those values, so we generate a number and
map to the final value. This is helpful for the explanation as it makes
graphing the distribution function easier.

Graph:

here's the d.f. of a fair dice
 p(X <= x)
       |
   1   |                                   *-------------------
   5/6 |                             *-----
   4/6 |                       *-----
   3/6 |                 *-----
   2/6 |           *-----
   1/6 |     *----- 
       |
   ----------====================================
       0     1     2     3     4     5     6       x

 FIG. 1

there are 6 discontinuities at 1...6 which means that the random
variable is discrete (ie., the prob. of having, say, 1, is not zero). 
If there was none, it'd mean the random variable is continuous and the
probability of having any given number would be zero (but then you'd
probably work with the probability density function ;-).

Note that the height of the "jump" at each discontinuity is the
probability of the event "the variable has the value <val where the
discontinuity happens>". This is because of the '=' in p(X <= x)

The function can also be defined piece-wise:
F(x) = 
               0      	     x <  1
               1/6	1 <= x <  2
               2/6	2 <= x <  3
               3/6	3 <= x <  4
               4/6	4 <= x <  5
               5/6	5 <= x <  6
               1	6 <= x 

If we want to generate one value amongst n, with different
probabilities, the "jumps" in the d.f. graph will have different height 
   |
   |                               *-------------------------
   |                           *---
   |                       *---
   |                   *---
   |
   |               *---
   | 
   |                   
   |           *---
   |      
   |
   |       *---
   |   *---
   |
   |
   |
   |
-------===============================================================
   0   1   2   3   4   5   6   7   8

 FIG. 2

We can then map 
    1 => val1
    .....
    n => valn


Generating a value
=================

Now what is happening is that we're walking on the y axis, drawing an
horizontal line for the given Y coordinate (generated as a uniform
random var. in [0,1]) and then given the intersection point w/ the
dist. func, taking the corresponding value on the horizontal axis:


   |
   |                               *-------------------------
   |                           *---
   |                       *---
 p |===================*---
   |                   |
   |               *---|
   |                   |
   |                   |
   |           *---    |
   |                   |
   |                   |
   |       *---        |
   |   *---            |
   |                   |
   |                   |
   |                   |
   |                   |
-------===============================================================
   0   1   2   3   4   5   6   7   8
                      ===

 FIG 3.

The reason why we don't care about the order:
============================================

we're in fact selecting the value on the Y axis, in the interval from 0
to 1, which is partitioned in n intervals of different sizes (depending
on the weights of the values).
            
      | A |   B  |  C  | D |  E         | F    |
      ==========================================
      0                                        1
  FIG 4.

It is obvious that given a truly uniform RNG in [0,1], the order of
these values doesn't matter; only the sizes of the intervals do. This
means in fact that we don't care about the particular mapping
   n => value_n

we could just as well use
  1 => valueN
  ....
  N => value1     
(or any other)

but just making sure that the jumps in the dist. func. correspond to the
probabilities of each of the discrete events.

For reference here's Tom's code

  def select
    p = rand
    each do |prob, value|
      return value if p < prob
      p -= prob
    end
    nil
  end

It is essentially going up the Y axis, until the probability is higher
than p; it then returns the corresponding value (see figure 3).

-- 
 _           _                             
| |__   __ _| |_ ___ _ __ ___   __ _ _ __  
| '_ \ / _` | __/ __| '_ ` _ \ / _` | '_ \ 
| |_) | (_| | |_\__ \ | | | | | (_| | | | |
|_.__/ \__,_|\__|___/_| |_| |_|\__,_|_| |_|
	Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com

/*
 *     Please skip to the bottom of this file if you ate lunch recently
 *                             -- Alan
 */
	-- from Linux kernel pre-2.1.91-1