On Sun, Oct 07, 2007 at 09:58:02AM +0900, M. Edward (Ed) Borasky wrote:
> M. Edward (Ed) Borasky wrote:
> 
> OK ... here's the analysis. The attached PDF is what's known as a box 
> and whisker plot, usually shortened to "boxplot". The raw numbers that 
> went into this are from the Alioth shootout page, and what I *haven't* 
> done is checked out which versions of Perl, Python, YARV, Ruby, jRuby 
> and PHP these tests used. They could be years old or they could have 
> been run yesterday morning. :) I discarded all the tests for which there 
> were any missing values.

Nice.  I wasn't expecting anything as effective for a quick, by-the-pants
analysis as a box plot.  What did you use to develop the graphic?  I'm
curious. . . .


> 
> The value plotted is the ratio of seconds for the benchmark on the 
> dynamic language to the seconds for the benchmark with gcc. Thus, gcc 
> equals 1.0 across the board and lower is better/faster. How do you 
> interpret these plots?

Correct me if I'm wrong, but . . . won't doing this as a gcc-comparative
ratio bias slower languages toward an exaggerated upper bound on the box?

I admit I haven't done this sort of thing in a while, so my sorta
heuristic analysis of what I see may be prone to minor glitches.


> 
> The whisker on the bottom is approximately the 5th percentile. In other 
> words, only five percent of the time is your performance going to be 
> that good or better. And the whisker on the top is approximately the 
> 95th percentile -- only five percent of the time is it going to be that 
> bad or worse.

Both yarv and ruby appear to be prone to low-end statistical outliers, or
perhaps to a "long tail" on the slow side of things.


> 
> The bottom of the box is the 25th percentile. 25 percent of the time, 
> the performance will be that good or better. The top of the box is the 
> 75th percentile. 25 percent of the time it will be that bad or worse.
> 
> So now we can make more precise Chad Perrin's notion of "general 
> neighborhoods of performance". We do this by looking at the height of 
> the median and the width of the box. And we see that relative to gcc, 
> YARV, Python, Perl and PHP have fairly close medians and the boxes are 
> about the same width. So they are all "in the same neighborhood" and gcc 
> is faster. Ruby and jRuby are *not* in the same neighborhood.

Actually, the ruby and php medians (using lower-case to denote
implementations) are almost identical, and perl's isn't far off,
according to this.  It appears that only in instances approaching
worst-case does ruby really begin to suffer.  Also, if I'm not entirely
missing the implications of a gcc-ratio comparison, I'm not entirely sure
that we can trust the huge visual differences in the height of the boxes
to indicate significant performance problem areas.

I'm also not sure the jruby implementation's numbers here are
representative of JRuby's strengths.  Does JRuby benefit from the
benefits of Java's optimizing VM for long-running processes?  I know it's
probably suffering, in micro-benchmarks, from increased start-up times as
it loads the VM.

I'm surprised to see python's median so low.  Are these benchmarks heavy
on bytecode-optimized Python, or do they tend to use standard interpreted
Python?


> 
> Now suppose your boss comes to you, as bosses do, and says, "well, all 
> them high-falutin' box plots are dandy, but the board of directors wants 
> one number for each language!" It turns out (and I'll let Google and 
> Wikipedia fill in the blanks for you) that the one number you want to 
> give your boss, aside from your cell phone number, is the *geometric 
> mean* of all the benchmark ratios. Again, smaller is better. So here's 
> how the languages stack up:
> 
> gcc     1.0
> yarv   13.2
> perl   14.0
> python 14.2
> php    15.5
> ruby   29.9
> jruby  55.0

That's a little clearer (barring the problem of whether jruby benefits
from the optimizations for long-running processes), but of course suffers
from the lack of attention to statistical outliers and low-end
performance curves.  That's one of the reasons I like box plots.


> 
> So yes, you can pretty much expect the same performance from YARV, Perl, 
> Python and PHP. And you can pretty much expect something like a 13 - 16 
> to one speed improvement if you decide to rewrite your application in C.
> 
> It's pretty clear to me from these numbers is that the only reason that 
> deploying web applications on the LAMP stack and its cousins using 
> PostgreSQL, Perl, Python and Ruby is economically viable is that they 
> spend most of their time either in the database or on the network. The 
> bad news is that for an application with 100 percent dynamic language 
> code -- no C libraries for the intensive calculations, no highly-tuned 
> web servers, databases or other components -- you're going to end up 
> throwing twice as much hardware at scaling problems in Ruby as you will 
> in PHP, Perl or Python. The good news is that YARV will level the 
> playing field.

Unless you're implying a different set of working conditions, you're not
going to throw twice as much hardware at the Ruby implementation, because
the bottlenecks for dynamic language web development (assuming good
design) still haven't changed.  They're still I/O and network traffic of
various sorts, not the languages.

In my experience, that's pretty much the sort of thing that happens with
*everything* that is lax enough on performance needs to "settle" for
something like Perl, et al.  That's the neighborhood I'm talking about
for performance: not so slow it can't be used for a common desktop app,
not so fast that it should be used for hard-core number-crunching or
graphics-intensive game development.

Neighborhoods are relative, after all.  There's the neighborhood with
C/C++, OCaml, and so on, way up at the top; there's the neighborhood with
Perl, Python, Ruby, and so on, somewhere in the middle; there's the
neighborhood with the languages so slow nobody really uses them, way down
at the bottom.  You're more likely to find you need to make finer
distinctions in execution speed up near the top, where performance
*really* matters.


> 
> One final note to implementers and language designers ... Python gets an 
> extra little pat on the back from me for having such a low spread. YARV 
> and other Ruby implementations need to pay attention to the fact that 
> their boxes are wider than Perl's and PHP's and a *lot* wider than 
> Python's. In other words, look at the benchmarks where you really suck 
> first. :)

No kidding.

Those of us picking a language for a project might want to check the
specifics of where a language really sucks, too, to determine whether
that's going to be a problem -- but other than that, as long as your
performance needs are simple enough to allow for a dynamic language like
Perl et alii, you might as well just pick the one you like the best.  At
least, that's my take on the matter -- look for problem areas that are
deal-breakers, and otherwise don't worry about it too much unless you're
writing code that has to fit in eight bits and run like the dickens (for
example).  Anything else strikes me as a case of "premature
optimization", especially considering the difference a good algorithm can
make.


> 
> In case you want the numbers that go with the boxplots, here they are:
> 
>         gcc  yarv python  perl   php  ruby jruby
> Low       1   1.2    1.4  0.93   1.4   1.5   3.4
> Q1        1   4.8    4.6  2.90   3.1   6.3  12.0
> Median    1   8.7   14.0 26.00  31.0  34.0  50.0
> Q3        1  68.0   45.0 55.00  55.0 170.0 340.0
> High      1 150.0   98.0 67.00 110.0 380.0 410.0

Thanks muchly for the statistics-wrangling.  It's instructive.

-- 
CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]
Kent Beck: "I always knew that one day Smalltalk would replace Java.  I
just didn't know it would be called Ruby."