Okay, so I didn't get to it this weekend, but it is an interesting
project, but can you explain a bit more on the data requirments?  I have
some questions inlined.

On Sat, May 05, 2007 at 11:20:05AM +0900, Bil Kleb wrote:
> Hi,
> 
> Jeremy Hinegardner wrote:
> >
> >If you want to describe your data needs a bit, and what operations you
> >need to operate on it, I'll be happy to play around with an ruby/sqlite3
> >program and see what pops out.  
> 
> I've created a small tolerance DSL, and coupled with the Monte Carlo
> Method[1] and the Pearson Correlation Coefficient[2], I'm performing
> sensitivity analysis[2] on some of the simulation codes used for our
> Orion vehicle[3].  In other words, jiggle the inputs, and see how
> sensitive the outputs are and which inputs are the most influential.
> 
> The current system[5] works, and after the YAML->Marshal migration,
> it scales well enough for now.  The trouble is the entire architecture
> is wrong if I want to monitor the Monte Carlos statistics to see
> if I can stop sampling, i.e., the statistics are converged.
>
> The current system consists of the following steps:
> 
>  1) Prepare a "sufficiently large" number of cases, each with random
>     variations of the input parameters per the tolerance DSL markup.
>     Save all these input variables and all their samples for step 5.

So the DSL generates a 'large' number of cases listing the input
parameters and their associated values.  The list of input parameters
static across all cases, or a set of cases?

That is, for a given "experiment" you have the same set of parameters,
just with a "large" numver of different values to put in those
parameters.


>  2) Run all the cases.
>  3) Collect all the samples of all the outputs of interest.

I'm also assuming that the output(s) for a given "experiment" would be
consistent in their parameters?

>  4) Compute running history of the output statistics to see
>     if they have have converged, i.e., the "sufficiently large"
>     guess was correct -- typically a wasteful number of around 3,000.
>     If not, start at step 1 again with a bigger number of cases.

So right now you are doing say for a given experiment f:

    inputs  : i1,i2,i3,...,in 
    outputs : o1,o2,o3,...,0m

run f(i1,i2,i3,...,in) -> [o1,...,on] where the values for i1,...,in are
"jiggled"  And you have around 3,000 diferent sets of inputs.

>  5) Compute normalized Pearson correlation coefficients for the
>     outputs and see which inputs they are most sensitive to by
>     using the data collected in steps 1 and 3.
>  6) Lobby for experiments to nail down these "tall pole" uncertainties.
>
> This system is plagued by the question of "sufficiently large"?
> The next generation system would do steps 1 through 3 in small
> batches, and at the end of each batch, check for the statistical
> convergence of step 4.  If convergence has been reached, shutdown
> the Monte Carlo process, declare victory, and proceed with steps
> 5 and 6.
[...] 

> [5] The current system consists of 5 Ruby codes at ~40 lines each
> plus some equally tiny library routines.

Would you be willing to share these?  I'm not sure if what I'm assuming
about your problem is correct or not, but it intrigues me and I'd like
to fiddle with the general problem :-).  I'd be happy to talk offlist
too.

enjoy,

-jeremy

-- 
========================================================================
 Jeremy Hinegardner                              jeremy / hinegardner.org