--jI8keyz6grp/JLjh
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

This study skipped something that seems rather important to me if he's
measuring compression; the ratios between compressed and uncompressed files.
These would tell by the standards of whatever compression algorithm used, which
implementations contain the most information per size.  (that is, compression
reduces the amount of redundant information).  Here are the ratios:

0.9092475589 ruby
0.9004711425 gawk
0.8968768415 mawk
0.8956336528 php
0.8530954879 lua
0.8501643964 pike
0.8285586392 perl
0.8230958231 icon
0.8136382889 njs
0.7974956822 xemacs
0.77109375 guile
0.764940239 python
0.7591514143 tcl
0.7422978177 gcc
0.7313994091 rep
0.730387448 g++
0.7094918504 java
0.7057591623 bigloo
0.7048085485 bash
0.6773229758 stalin
0.6671133744 erlang
0.6576839039 bigforth
0.6564399421 ocamlb
0.6564399421 ocaml
0.6501501502 ghc
0.6413418846 cmucl
0.6300151811 gforth
0.5898446557 se
0.5364870915 smlnj
0.5338424158 mercury
0.5319638455 mlton

Of course bzip2 isn't an ideal compression algorithm etc etc so this sort of
measure should be taken with a grain of salt.

Another thing I noticed about this was the rather large size of the sml
versions relative to others - this surprises me somewhat since any typical sml
code that I've ever written is more compact than the java version of the same
code, and I'm a much better java programmer (I had a class where I wrote
equivalent versions of a bunch of stuff in both lagnauges).  Perhaps this
reflects that all the apps used in this test were quite simple?  This complaint
probably affects some of the other languages there as well.

-kyle

On Wed, Jul 31, 2002 at 11:22:28PM +0900, gunnar.andersson / telelogic.com wrote:
> Sorry if this has been posted already and I missed it.
> I thought it was interesting.
> 
> Summary:
> Source code size can be compared in different ways.
> in terms of Lines-of-code (LOC), in bytes,
> or after compression which gives a rough estimate 
> of complexity because it filters out things like the
> length of keywords.  Interestingly, while some languages
> show very different ranks before after compression
> Ruby code is among the very shortest, both before and after 
> compression.
> 
> http://zooko.com/shootout-compress.html
> 
> /Gunnar

-- 
http://mas.cs.umass.edu/~rawlins
--
I used to know the answer to that one, before I ate so many preservatives...
      (Zippy)

--jI8keyz6grp/JLjh
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE9SDFdncsSsHSBypARAscXAKCDo49AjBRd2+aH1zoMDQUe14Pq8QCgu2ZP
vLpuGETDbnQILjNyku4bluM5e
-----END PGP SIGNATURE-----

--jI8keyz6grp/JLjh--