On Sat, May 26, 2001 at 06:10:05PM +0900, Stefan Matthias Aust wrote:
> I compared Ruby 1.6.4 against JDK 1.4 beta and cannot confirm your
> results.  (1.4 includes a regex package which looks like the one from
> the jakarata project so I used the built-in version)
> 
> My ruby code needs some 4.7s to run (I used cygwin's time command)
> My java code needs some 2.7s to run (again measured with time)

Hm.  I was comparing Ruby 1.6.4 against IBM's Java 1.3 on Linux.
My benchmarks were from my office computer (200Mhz CPU 128Mb RAM).
I put in benchmarking code (to, as you said, remove the VM initialization
time) and timed everything again.  I still got:

Java runtime: 14.794s	(Sun's JDK 1.3 w/ Hotspot)
Java runtime: 15.070s	(IBM's JDK 1.3)
Ruby runtime:  4.707s	(Ruby 1.3)

Interestingly, the VM initialization and other code overhead only added
about another second on my machine, according to 'time'.  I don't know 
how much to read into that; Ruby reported consistently a benchmark 
time *greater* than the "time", which should not be possible:

ser@ender ~% time ruby regtest.rb
Runtime: 4.703741
ruby regtest.rb  4.33s user 0.05s system 95% cpu 4.581 total

The gnu.regex package, BTW, wasn't even in the running.  A single itteration
took longer than a complete run of the org.apache version.

> Then, the ruby code still needs 4.3s but the Java code only 1.9s.

I'm *really* suspicious of this.  Have you tried compiling Ruby with a
different compiler?  I'd also consider whether or not Sun's regex package 
is as complex or complete as the Ruby (or even org.apache or gnu.regex) 
packages.  Typically, Sun's implementations are extremely spartan 
(Eg: Sun Collections compared to ORO's Collections).

> So don't worry about the speed of the regexp code.  I also doubt that
> this would be the most difficult part for a Ruby interpreter written
> in Java.  To get it compatible with all "features" of the C one is
> much more challenging IMHO.

I, too, agree with this statement, although speed *is* an issue.  The reason
that Java hasn't become the defacto application language is primarily because
of the speed issue (IMO).

> JIT-compiled code is typically faster than you'd expect.  I don't want

Not than what *I'd* expect.  I'm constantly disappointed in JIT performance,
although my big beef with Java at the moment are the memory constraints.
Note that I am currently a (professional) Java applications developer, and 
have been for the past six years (I got lucky with an employer
who embraced Java when it was still alpha -- 0.9).  I use it because I
still believe it is better than anything else out there, but I've never
been pleasantly suprised by its performance.

> to go into details, but a sophisticated compiler like the SELF VM
> which was the precursor of the Hotspot VM is able to generate faster
> code than a static compiler can because its ability to dynamically
> recompile code based on type heuristics generated while the program is
> running.

I'd be willing to debate this point.  Most C compiler optimizing
algorithms are highly tuned.  Heuristic JIT compilers typically do a
good job at making the JIT itself efficient by determining *what* to compile,
but don't improve much on the basic optimizing algorithms themselves.  There
are only so many ways to optimize a simple loop, and unless Matz's regex
code is really inefficient, I'd say there is some other factor at work
that is causing the results you are seeing.

Fundamentally, Java has a lot of overhead that slows things down.  I don't 
care if you hand-tune this overhead in assembly; it is still going to be
slower than the same code without the overhead (Object allocation, garbage
collection, exception handling, etc).  While I don't believe that Java is
as slow as a lot of people think it is, I have yet to see any evidence of
a Java application outperforming a native, compiled application -- your
benchmarks aside :-)

With most of the time (of this program) being spent in native code on the
Ruby side, if we can prove that your benchmarks are not the result of a
fluke (Cygwin is anti-optimizing, Sun is cheating by having the regex
code in a native library, or some such nonsense), then I'd suggest that we
take a look at the Ruby regex code and look for speedbumps.

--- SER