In this essay I'm going to attempt, one final time, to demonstrate
that it is possible to have a useful benchmark. I'm going to do
so by telling about a real benchmark that we used to solve a real
problem in the Deep Space Network. But first, some clarifications:

1. I have not expressed, and do not hold, an opinion regarding the
Alioth Shootout.

2. I have not expressed, and do not hold, an opinion regarding the
Ackermann function as a benchmark.

3. I originally entered this discussion because I objected to
the assertion "Benchmarks, like statistics, are lies." My main
objection was to the inclusion of statistics. Statistics was, for all
practical purposes, invented by a scientist (Gauss) to solve physics
problems. It is utterly indispensable to the practice of science and
engineering. Nowadays, virtually every digital device (MP3 player,
cell phone, network interface) is designed using theoretical concepts
pioneered by Claude Shannon in 1948. Shannon's information theory is
heavily statistical in nature; he called his measure of information
"entropy" because its formulation is so similar to the concept of
the same name in statistical mechanics. People outside the field of
engineering are sometimes surprised to discover just how essential
the deep theories of mathematics and statistics are to ordinary
things like cell phones and airplanes. Here's an interesting
interview with Andrew Viterbi, a first-rate theoretician who made
a huge impact on the practical world, for those who are interested:
http://www.ieee.org/organizations/history_center/oral_histories/transcripts/viterbi.html.
It's an interesting twist that he worked on the Deep Space Network,
which features in the story below.

4. As the discussion was more about benchmarks than statistics,
I'll give what I believe is a counterexample to the claim. I have
several others, some perhaps better than this one, but this is one
that I worked on directly.

5. I've marked this OT because it's not about Ruby and LONG because
it's, well, long. I don't expect to have more to say.

First, a little background. In the early 90s I worked on the Deep
Space Network, which is managed by JPL for NASA. The purpose of the
DSN, among other things, is to communicate with spacecraft beyond
earth orbit. This is a tremendous technical challenge. Voyager 1,
for example, has a transmitter that produces somewhere around 5
watts of power, about the same as a walkie-talkie. It's currently
about 14 billion km from earth. It's hard to fathom just how weak a
signal that is. To hear it and actually get data from it requires
huge antennas, cryogenically-cooled high-power amplifiers, and
exotic error correcting codes based on information theory.

The DSN has antenna complexes in California, Spain, and
Australia. Since long before the Internet, these complexes have been
connected by digital links carried over geosynchronous satellite
circuits.

The people who built this system were world-class telecom engineers,
not computer guys. The DSN ground system had a lot of custom
hardware and software in it, as well as "COTS" products made by
smaller manufacturers you've probably never heard of. By 1990 or
so people began to wonder whether the DSN ground infrastructure
should be re-implemented using Unix systems and TCP/IP. I was in
the group that thought we should.

People on the other side had advanced a theoretical argument against
TCP/IP, based on the quite true fact that the TCP specification then
allowed only 64 kB of sliding window. (Since then the protocol has
been extended to work better on "fat pipes".) It was true that you
could not completely load a T1 circuit, for example, with a single
TCP stream unless the network round-trip time was less than about
330 ms.  Unfortunately, the speed of light requires about 230 ms
just to go 35,000 km up to the comsat and back down, which meant
a round trip from Australia to JPL would take at least 460 ms. Not
good enough.

It occurred to me that the flaw in the argument was that the
limitation applied only to a single stream, and that one way around
it would be to open multiple streams and "inverse multiplex" the
data across them. (Other people not at JPL, we discovered later,
had the same idea and built multi-stream FTP servers and clients,
for example.) So we wanted to test this idea out.

Remember where we are now. We have no application to run on these
Unix machines. We have no prototype of an application. We just have
an idea, and we want to see if it looks promising. If we find that
we can work around the single-stream limitation of TCP, then the
Unix/TCP approach is still in the running. If not, it's probably
dead. (It might still be a good idea, just not work because we're
not smart enough to pull it off.  But in any case, this idea is
going nowhere unless we can move data.)

So Russ Byrne, Vance Heron (also a Rubyist and reader of this list),
and I built a test configuration and wrote some test code. The
test configuration used a hardware satellite simulator to impose
a variable delay between two routers. We connected a transmitter
machine to one router and a receiver to the other.  The code was
about 600 lines of C, and used BSD sockets and select().

I claim that this code is a benchmark. It
satisfies my definition and the one on Wikipedia:
http://en.wikipedia.org/wiki/Benchmark_%28computing%29.  It did
not remotely approximate our real ground applications, which do
a lot of formatting, routing, controlling, accounting, etc. It
tested only whether it was possible to achieve something near the
theoretical maximum throughput across the satellite simulator using
a particular technique.

To make a long story shorter, it was. The ability to control
precisely all relevant factors allowed us to answer some important
questions. First of all, we showed that it was possible to fully
load a T1 in the face of worse-than-realistic delays and bit-error
rates.  We compared different Unix implementations, and discovered
(surprise) that Sun's TCP stack significantly outperformed some
of their competitors. We compared framing protocols on the serial
circuit and found a bug in Cisco's implementation of HDLC.  (Cisco
already knew about it, but we confirmed it independently.) Later,
when we got satellite time, we verified the performance between two
machines linked by a 140,000 km path. And finally, the benchmark
allowed me to answer with confidence the most important question
I've ever been asked at work.

In 1993, the Galileo spacecraft's high-gain antenna failed to open.
This appeared at the time to be a devastating blow. The expected
data rate at Jupiter without the HGA was a factor of some 10,000
below what JPL had planned. (10 bits per second.) Not surprisingly,
the project's highest priority became figuring out how to achieve a
reasonable fraction of the science objectives before the spacecraft
arrived at Jupiter in 1995. The team ended up recommending some
hardware modifications to the ground antennas, development of even
more exotic data compression and error-correction codes, changes
in operations procedures, and a near-real-time arraying technique
that called for combining digital signals from large antennas in
Australia and California. That required moving lots of bits between
ground complexes and not losing any. The manager of the mission
rescue team asked me at one point "If we put in all these routers
and Unix boxes, will they support arraying for Galileo?" He needed
to know right then. I said "Yes, they will." He did, and they did.

This is not a personal moment of glory. Lots of people worked to
get the DSN to the point where it could make a major architectural
change. And the team that developed the engineering changes deserves
the credit.  But one part of succeeding was being able to say "we
can do what Galileo needs", and we could say that because we'd used
our benchmark to hammer the problem into submission. We understood
it well enough to convince management to trust our conclusions.

Galileo went on to tremendous success:
http://www.newyorker.com/fact/content/?030908fa_fact.

Steve