Hi Wesley,

Thanks for posting the benchmark info!  I found it very interesting.
I ran the same benchmarks with Visual C++ on Windows 2000, and also
changed the C++ source to be more efficient.

Thursday, June 21, 2001, 2:55:04 PM, you wrote:

WJL> All data is in seconds.

WJL> System:
WJL>         HP Vectra, PIII 733MHz, 128MEG RAM
WJL>         Red Hat Linux 7.1

WJL> g++ 2.96
WJL> Ruby 1.6.4

WJL> SUMMARY:
WJL>                         C++     Tcl     Perl    Python  Ruby
WJL> v append:               .146    8.85    1.33    4.28    1.10
WJL> v append and read:      .212    14.8    3.04    6.15    2.30
WJL> v append and iterate:   .159    ----    2.13    5.21    1.65

I ran the C++ and Ruby tests on this system:
   Tyan Thunderbolt motherboard, dual PIII 500MHz, 1 Gig RAM
   Windows 2000 Professional SP 1

Visual C++ 6 SP4, Release builds (optimized)
Ruby 1.6.3

And got these results:
                         C++    C++     Ruby
                                resize
v append:               .093    .046    1.80
v append and read:      .109    .062    3.73
v append and iterate:   .109    .062    2.80

The first C++ column uses the same source you used.  (Well, actually I
had to make one small change.  When I first ran these tests, all three
tests, "append", "append and read" and "append and iterate" took
exactly the same amount of time.  This seemed odd, as the second and
third tests are clearly doing more work than the first test.  Upon
investigation, I found out that the optimizer wasn't generating any
code for the second loop in these tests:

  for (int i=0; i<1000000; i++) {
    int b = a[i];
  }

The optimizer realized that this loop had no effect on the results of
the program, and deleted it completely.  I fixed this by making b into
a global:

int b;
int main(int argc, char* argv[])
{
  // ...
  for (int i=0; i<1000000; i++) {
    b = a[i];
  }

This is a recurrent problem with C and C++ benchmarks, in that the
optimizer often deletes portions of the benchmark program.)

I also made a change to speed up the execution of the C++ code.  In
the original source, the elements are added to the vector like this:

   vector<int> a;
   for (int i=0; i<1000000; i++) {
     a.push_back(i);
   }

If the final size of the vector is known in advance, it is more
efficient to first resize the vector to the desired size:

  vector<int> a;
  a.resize(1000000);
  for (int i=0; i<1000000; i++) {
    a[i] = i;
  }

The "C++ resize" column above makes this change to the source, which
then runs about twice as fast.  (I have appended my modified sources below)

WJL> Even more fascinating is that the C++ implementation runs out of
WJL> memory at 16 million elements.  Ruby is able to create a vector of 25
WJL> million elements and read the contents back (unless Ruby is lying to
WJL> me).

I would expect C++ to behave like this.  The C++ Standard requires
vector implementations to be able to append an element to the end of
a vector in amortized constant time.  Since this will sometimes
cause a memory reallocation, which takes considerable time, the
obvious way to meet the requirement of the standard (and, as far as I
know, the way all implementations do it) is to allocate twice as much
memory every time the same vector is resized.  This way, the reallocation
happens after 2 elements, then after 4 elements, then after 8 elements,
etc.  This yields the infinite series 1/2 + 1/4 + 1/8 + 1/16 + ...
which asymptotically approaches 1, thus meeting the amortized
constant time requirement.

If your system has space for 25 million elements, then when the vector
grows beyond 16 million elements, C++/STL tries to allocate 32 million
elements, which fails.  (Actually, if it can't just extend the
existing memory block, it temporarily needs space for 16 + 32 million
elements.)

I tried this on my system, which has more RAM, and Ruby could allocate
an array of 30 million elements, but couldn't allocate 31 million
elements. C++ could allocate 133 million elements when adding them one
at a time with
     a.push_back(i);
     
When I tried doing
      a.resize(230000000);
C++ could allocate 230 million elements.

Thanks!!!

Wayne Vucenic
No Bugs Software
"High Reliability C++ and Ruby (!!!) Contract Programming in Silicon
Valley"


***************** VECTOR APPEND WITH RESIZE *********************
C++:                                   
#include <vector>
#include <iostream>
using namespace std;
int main() {
  vector<int> a;
  a.resize(1000000);
  for (int i=0; i<1000000; i++) {
          a[i] = i; 
  }
  cout << "finished" << endl;
  return 0;
}


***************** VECTOR APPEND AND READ WITH RESIZE *****************
C++:                                   
#include <vector>
#include <iostream>
using namespace std;
int b;
int main() {
  vector<int> a;
  a.resize(1000000);
  for (int i=0; i<1000000; i++) {
          a[i] = i; 
  }
  for (i=0; i<1000000; i++) {
    b = a[i];
  }
  cout << "finished" << endl;
  return 0;
}


*****************  VECTOR APPEND AND ITERATE WITH RESIZE *************
C++:                                           
#include <vector>
#include <iostream>
using namespace std;
int b;
int main() {
  vector<int> a;
  a.resize(1000000);
  for (int i=0; i<1000000; i++) {
          a[i] = i; 
  }
  vector <int>::iterator i1, iend1 = a.end();
  for (i1 = a.begin(); i1!=iend1; i1++) {
    b = *i1;
  }
  cout << "finished" << endl;
  return 0;
}