In this essay I'm going to attempt, one final time, to demonstrate that it is possible to have a useful benchmark. I'm going to do so by telling about a real benchmark that we used to solve a real problem in the Deep Space Network. But first, some clarifications: 1. I have not expressed, and do not hold, an opinion regarding the Alioth Shootout. 2. I have not expressed, and do not hold, an opinion regarding the Ackermann function as a benchmark. 3. I originally entered this discussion because I objected to the assertion "Benchmarks, like statistics, are lies." My main objection was to the inclusion of statistics. Statistics was, for all practical purposes, invented by a scientist (Gauss) to solve physics problems. It is utterly indispensable to the practice of science and engineering. Nowadays, virtually every digital device (MP3 player, cell phone, network interface) is designed using theoretical concepts pioneered by Claude Shannon in 1948. Shannon's information theory is heavily statistical in nature; he called his measure of information "entropy" because its formulation is so similar to the concept of the same name in statistical mechanics. People outside the field of engineering are sometimes surprised to discover just how essential the deep theories of mathematics and statistics are to ordinary things like cell phones and airplanes. Here's an interesting interview with Andrew Viterbi, a first-rate theoretician who made a huge impact on the practical world, for those who are interested: http://www.ieee.org/organizations/history_center/oral_histories/transcripts/viterbi.html. It's an interesting twist that he worked on the Deep Space Network, which features in the story below. 4. As the discussion was more about benchmarks than statistics, I'll give what I believe is a counterexample to the claim. I have several others, some perhaps better than this one, but this is one that I worked on directly. 5. I've marked this OT because it's not about Ruby and LONG because it's, well, long. I don't expect to have more to say. First, a little background. In the early 90s I worked on the Deep Space Network, which is managed by JPL for NASA. The purpose of the DSN, among other things, is to communicate with spacecraft beyond earth orbit. This is a tremendous technical challenge. Voyager 1, for example, has a transmitter that produces somewhere around 5 watts of power, about the same as a walkie-talkie. It's currently about 14 billion km from earth. It's hard to fathom just how weak a signal that is. To hear it and actually get data from it requires huge antennas, cryogenically-cooled high-power amplifiers, and exotic error correcting codes based on information theory. The DSN has antenna complexes in California, Spain, and Australia. Since long before the Internet, these complexes have been connected by digital links carried over geosynchronous satellite circuits. The people who built this system were world-class telecom engineers, not computer guys. The DSN ground system had a lot of custom hardware and software in it, as well as "COTS" products made by smaller manufacturers you've probably never heard of. By 1990 or so people began to wonder whether the DSN ground infrastructure should be re-implemented using Unix systems and TCP/IP. I was in the group that thought we should. People on the other side had advanced a theoretical argument against TCP/IP, based on the quite true fact that the TCP specification then allowed only 64 kB of sliding window. (Since then the protocol has been extended to work better on "fat pipes".) It was true that you could not completely load a T1 circuit, for example, with a single TCP stream unless the network round-trip time was less than about 330 ms. Unfortunately, the speed of light requires about 230 ms just to go 35,000 km up to the comsat and back down, which meant a round trip from Australia to JPL would take at least 460 ms. Not good enough. It occurred to me that the flaw in the argument was that the limitation applied only to a single stream, and that one way around it would be to open multiple streams and "inverse multiplex" the data across them. (Other people not at JPL, we discovered later, had the same idea and built multi-stream FTP servers and clients, for example.) So we wanted to test this idea out. Remember where we are now. We have no application to run on these Unix machines. We have no prototype of an application. We just have an idea, and we want to see if it looks promising. If we find that we can work around the single-stream limitation of TCP, then the Unix/TCP approach is still in the running. If not, it's probably dead. (It might still be a good idea, just not work because we're not smart enough to pull it off. But in any case, this idea is going nowhere unless we can move data.) So Russ Byrne, Vance Heron (also a Rubyist and reader of this list), and I built a test configuration and wrote some test code. The test configuration used a hardware satellite simulator to impose a variable delay between two routers. We connected a transmitter machine to one router and a receiver to the other. The code was about 600 lines of C, and used BSD sockets and select(). I claim that this code is a benchmark. It satisfies my definition and the one on Wikipedia: http://en.wikipedia.org/wiki/Benchmark_%28computing%29. It did not remotely approximate our real ground applications, which do a lot of formatting, routing, controlling, accounting, etc. It tested only whether it was possible to achieve something near the theoretical maximum throughput across the satellite simulator using a particular technique. To make a long story shorter, it was. The ability to control precisely all relevant factors allowed us to answer some important questions. First of all, we showed that it was possible to fully load a T1 in the face of worse-than-realistic delays and bit-error rates. We compared different Unix implementations, and discovered (surprise) that Sun's TCP stack significantly outperformed some of their competitors. We compared framing protocols on the serial circuit and found a bug in Cisco's implementation of HDLC. (Cisco already knew about it, but we confirmed it independently.) Later, when we got satellite time, we verified the performance between two machines linked by a 140,000 km path. And finally, the benchmark allowed me to answer with confidence the most important question I've ever been asked at work. In 1993, the Galileo spacecraft's high-gain antenna failed to open. This appeared at the time to be a devastating blow. The expected data rate at Jupiter without the HGA was a factor of some 10,000 below what JPL had planned. (10 bits per second.) Not surprisingly, the project's highest priority became figuring out how to achieve a reasonable fraction of the science objectives before the spacecraft arrived at Jupiter in 1995. The team ended up recommending some hardware modifications to the ground antennas, development of even more exotic data compression and error-correction codes, changes in operations procedures, and a near-real-time arraying technique that called for combining digital signals from large antennas in Australia and California. That required moving lots of bits between ground complexes and not losing any. The manager of the mission rescue team asked me at one point "If we put in all these routers and Unix boxes, will they support arraying for Galileo?" He needed to know right then. I said "Yes, they will." He did, and they did. This is not a personal moment of glory. Lots of people worked to get the DSN to the point where it could make a major architectural change. And the team that developed the engineering changes deserves the credit. But one part of succeeding was being able to say "we can do what Galileo needs", and we could say that because we'd used our benchmark to hammer the problem into submission. We understood it well enough to convince management to trust our conclusions. Galileo went on to tremendous success: http://www.newyorker.com/fact/content/?030908fa_fact. Steve