On Thu, Jul 27, 2006 at 04:27:57AM +0900, Ashley Moran wrote:
> 
> I'm late to this conversation but I've been interested in Ruby  
> performance lately.  I just had to write a script to process about  
> 1-1.5GB of CSV data  (No major calculations, but it involves about 20  
> million rows, or something in that region).  The Ruby implementation  
> I wrote takes about 2.5 hours to run - I think memory management is  
> the main issue as the manual garbage collection run I added after  
> each file goes into several minutes for the larger sets of data.  As  
> you can imagine, I am more than eager for YARV/Rite.
> 
> Anyway, my question really is that I thought a VM was a prerequisite  
> or JIT?  Is that not the case?  And if the YARV VM is not the way to  
> go, what is?

The canonical example for comparison, I suppose, is the Java VM vs. the
Perl JIT compiler.  In Java, the source is compiled to bytecode and
stored.  In Perl, the source remains in source form, and is stored as
ASCII (or whatever).  When execution happens with Java, the VM actually
interprets the bytecode.  Java bytecode is compiled for a virtual
computer system (the "virtual machine"), which then runs the code as
though it were native binary compiled for this virtual machine.  That
virtual machine is, from the perspective of the OS, an interpreter,
however.  Thus, Java is generally half-compiled and half-interpreted,
which speeds up the interpretation process.

When execution happens in Perl 5.x, on the other hand, a compiler runs
at execution time, compiling executable binary code from the source.  It
does so in stages, however, to allow for the dynamic runtime effects of
Perl to take place -- which is one reason the JIT compiler is generally
preferable to a compiler of persistent binary executables in the style
of C.  Perl is, thus, technically a compiled language, and not an
interpreted language like Ruby.

Something akin to bytecode compilation could be used to improve upon the
execution speed of Perl programs without diverging from the
JIT-compilation execution it currently uses and also without giving up
any of the dynamic runtime capabilities of Perl.  This would involve
running the first (couple of) pass(es) of the compiler to produce a
persistent binary compiled file with the dynamic elements still left in
an uncompiled form, to be JIT-compiled at execution time.  That would
probably grant the best performance available for a dynamic language,
and would avoid the overhead of a VM implementation.  It would, however,
require some pretty clever programmers to implement in a sane fashion.

I'm not entirely certain that would be appropriate for Ruby, considering
how much of the language ends up being dynamic in implementation, but it
bothers me that it doesn't even seem to be up for discussion.  In fact,
Perl is heading in the direction of a VM implementation with Perl 6,
despite the performance successes of the Perl 5.x compiler.  Rather than
improve upon an implementation that is working brilliantly, they seem
intent upon tossing it out and creating a different implementation
altogether that, as far as I can see, doesn't hold out much hope for
improvement.  I could, of course, be wrong about that, but that's how it
looks from where I'm standing.

It just looks to me like everyone's chasing VMs.  While the nontrivial
problems with Java's VM are in many cases specific to the Java VM (the
Smalltalk VMs have tended to be rather better designed, for instance),
there are still issues inherent in the VM approach as currently
envisioned, and as such it leaves sort of a bad taste in my mouth.

I think I've rambled.  I'll stop now.

-- 
CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]
"There comes a time in the history of any project when it becomes necessary
to shoot the engineers and begin production." - MacUser, November 1990