Robert Dober wrote:
> Ok let us get off our nice host thread, which is much better of course.
> 
> Austin what you are suggesting seems very interesting to me, you claim that
> we do not know anything about benchmarking.
> For myself I accept this as a safe and comfortable working theory.
> I am more than willing to learn though (to know even less afterwards but
> philopsophy can wait, unless Ara is with us;).
> So it is Ed, if I read correctly, who could teach us some tricks, R U with
> us Ed?
> 
> Links?
> 
> I am looking forward to this.
> 
> Cheers
> Robert
> 
Yeah, I'm with you. I actually took a look at the shootout page. First
of all, it isn't as bad a site as some people make it out to be. Second,
they are running Debian and Gentoo, which means almost anyone could
duplicate their work (assuming the whole enchilada can be downloaded as
a tarball).

Analysis Phase (Trick 1):

1. Collect the whole matrix of benchmarks. The rows will be benchmark
names and the columns will be languages, and the cells in the matrix
will be benchmark run times. Pick a language to be the "standard". C is
probably the obvious choice, since it's likely to be the most "practical
low-level language" (meaning not as many folks know Forth.) :)

2. Now you compute the *natural log* of ratios of the times for all the
languages to the standard for each of the benchmarks. In some convenient
statistics package (A spreadsheet works fine, but I'd do it in R because
the kernel density estimators, boxplots, etc. are built in), compute the
histograms (or kernel density estimators, or boxplots, or all of the
above) of the ratios for each language. That tells you how the ratios
are distributed.

Example:

	Ruby	Perl	Python	PHP	C
Bench1	tr1	tp1	ty1	th1	tc1
Bench2	tr2	tp2	ty2	th2	tc2
Bench3	tr3	tp3	ty3	th3	tc3


	Ruby		Perl		Python	PHP	C
Bench1	ln(tr1/tc1)	ln(tp1/tc1)	.	.	1
Bench2	ln(tr2/tc2)	ln(tp2/tc2)	.	.	1
Bench3	ln(tr3/tc3)	ln(tp3/tc3)	.	.	1

And then take the histograms of the columns (smaller is better).

Tuning Phase (Trick 2):

Find the midpoints on the density curves, boxplots or histograms. These
are the "typical" benchmarks. They are more representative than the
"outliers". I saw one, for example, where Ruby was over 100 times as
fast as Perl. That's not worth investing any time in -- it's some kind
of fluke, something either Perl sucks at, Ruby is wonderful at, or a
better implementation in the Ruby code than the Perl code.

Now you build a "profiling Ruby", run the mid-range benchmarks with
profiling, and see where Ruby is spending its time. If you happen to
have a friend on the YARV team or the Cardinal team, have them run the
benchmarks too.

Some other tricks:

Once you know where Ruby is spending its time, play with compiler flags.
gcc has oodles of possible optimizations, and gcc itself was tuned by
processes like this. It's worth spending a lot of time compiling the
Ruby interpreter, since it's going to be run often.

Those are simple "low-hanging fruit" tricks ... stuff you can do without
actually knowing what's going on inside the Ruby interpreter. It will be
painfully obvious from the profiles, I think, where the opportunities are.