On Thu, Jul 27, 2006 at 06:24:49AM +0900, Charles O Nutter wrote:
> On 7/26/06, Chad Perrin <perrin / apotheon.com> wrote:
> >
> >The canonical example for comparison, I suppose, is the Java VM vs. the
> >Perl JIT compiler.  In Java, the source is compiled to bytecode and
> >stored.  In Perl, the source remains in source form, and is stored as
> >ASCII (or whatever).  When execution happens with Java, the VM actually
> >interprets the bytecode.  Java bytecode is compiled for a virtual
> >computer system (the "virtual machine"), which then runs the code as
> >though it were native binary compiled for this virtual machine.  That
> >virtual machine is, from the perspective of the OS, an interpreter,
> >however.  Thus, Java is generally half-compiled and half-interpreted,
> >which speeds up the interpretation process.
> 
> 
> Half true. The Java VM could be called "half-compiled and half-interpreted"
> at runtime for only a short time, and only if you do not consider VM
> bytecodes to be a valid "compiled" state. However most bytecode is very
> quickly compiled into processor-native code, making those bits fully
> compiled. After a long enough runtime (not very long in actuality), all Java
> code is running as native code for the target processor (with various
> degrees of optimization and overhead).

True . . . but this results in fairly abysmal performance, all things
considered, for short runs.  Also, see below regarding dynamic
programming.


> 
> The difference between AOT compilation with GCC or .NET is that Java's
> compiler can make determinations based on runtime profiling about *how* to
> compile that "last mile" in the most optimal way possible. The bytecode
> compilation does, as you say, primarily speed up the interpretation process.
> However it's far from the whole story, and the runtime JITing of bytecode
> into native code is where the magic lives. To miss that is to miss the
> greatest single feature of the JVM.

This also is true, but that benefit is entirely unusable for highly
dynamic code, unfortunately -- and, in fact, even bytecode compilation
might be a bit too much to ask for too-dynamic code.  I suppose it's
something for pointier heads than mine, since I'm not actually a
compiler-writer or language-designer (yet).  It's also worth noting that
this isn't accomplishing anything that isn't also accomplished by the
Perl JIT compiler.


> 
> When execution happens in Perl 5.x, on the other hand, a compiler runs
> >at execution time, compiling executable binary code from the source.  It
> >does so in stages, however, to allow for the dynamic runtime effects of
> >Perl to take place -- which is one reason the JIT compiler is generally
> >preferable to a compiler of persistent binary executables in the style
> >of C.  Perl is, thus, technically a compiled language, and not an
> >interpreted language like Ruby.
> 
> I am not familiar with Perl's compiler. Does it compile to processor-native
> code or to an intermediate bytecode of some kind?

There is no intermediate bytecode step for Perl, as far as I'm aware.
It's not a question I've directly asked one of the Perl internals
maintainers, but everything I know about the Perl compiler confirms my
belief that it simply does compilation to machine code.


> 
> We're also juggling terms pretty loosely here. A compiler converts
> human-readable code into machine-readable code. If the "machine" is a VM,
> then you're fully compiling. If the VM code later gets compiled into "real
> machine" code, that's another compile cycle. Compilation isn't as cut and
> dried as you make it out to be, and claiming that, for example, Java is
> "half compiled" is just plain wrong.

Let's call it "virtually compiled", then, since it's being compiled to
code that is readable by a "virtual machine" -- or, better yet, we can
call it bytecode and say that it's not fully compiled to physical
machine-readable code, which is what I was trying to explain in the
first place.


> 
> Something akin to bytecode compilation could be used to improve upon the
> >execution speed of Perl programs without diverging from the
> >JIT-compilation execution it currently uses and also without giving up
> >any of the dynamic runtime capabilities of Perl.  This would involve
> >running the first (couple of) pass(es) of the compiler to produce a
> >persistent binary compiled file with the dynamic elements still left in
> >an uncompiled form, to be JIT-compiled at execution time.  That would
> >probably grant the best performance available for a dynamic language,
> >and would avoid the overhead of a VM implementation.  It would, however,
> >require some pretty clever programmers to implement in a sane fashion.
> 
> There are a lot of clever programmers out there.

True, of course.  The problem is getting them to work on a given
problem.


> 
> Having worked heavily on a Ruby implementation, I can say for certain that
> 99% of Ruby code is static. There are some dynamic bits, especially within
> Rails where methods are juggled about like flaming swords, but even these
> dynamic bits eventually settle into mostly-static sections of code.

I love that imagery, with the flaming sword juggling.  Thanks.


> Compilation of Ruby code into either bytecode for a fast interpreter engine
> like YARV or into bytecode for a VM like Java is therefore perfectly valid
> and very effective. Preliminary compiler results for JRuby show a boost of
> 50% performance over previous versions, and that's without optimizing many
> of the more expensive Ruby operations (call logic, block management).
> Whether a VM is present (as in JRuby) or not (as may be the case with YARV),
> eliminating the overhead of per-node interpretation is a big positive. JRuby
> will also feature a JIT compiler to allow running arbitrary .rb files
> directly, optimizing them as necessary and as seems valid based on runtime
> characteristics. I don't know if YARV will do the same, but it's a good
> idea.

I'm sure a VM or similar approach (and, frankly, I do prefer the
fast-interpreter approach over the VM approach) would provide ample
opportunity to improve upon Ruby's current performance, but that doesn't
necessarily mean it's better than other approaches to improving
performance.  That's where I was aiming.


> 
> The whole VM thing is such a small issue. Ruby itself is really just a VM,
> where its instructions are the elements in its AST. The definition of a VM
> is sufficiently vague enough to include most other interpreters in the same
> family. Perhaps you are specifically referring to VMs that provide a set of
> "processor-like" fine-grained operations, attempting to simulate some sort
> of magical imaginary hardware? That would describe the Java VM pretty well,
> though in actuality there are real processes that run Java bytecodes
> natively as well. Whether or not a language runs on top of a VM is
> irrelevant, especially considering JRuby is a mostly-compatible version of
> Ruby running on top of a VM. It matters much more that translation to
> whatever underlying machine....virtual or otherwise...is as optimal and
> clean as possible.

A dividing line between "interpreter" and "VM" has always seemed rather
more clear to me than you make it sound.  Yes, I do refer to a
simulation of an "imaginary" (or, more to the point, "virtual") machine,
as opposed to a process that interprets code.  Oh, wait, there's that
really, really obvious dividing line I keep seeing.

The use (or lack) of a VM does indeed matter: it's an implementation
detail, and implementation details make a rather significant difference
in performance.  The ability of the parser to quickly execute what's fed
to it is important, as you indicate, but so too is the ability of the
parser to run quickly itself -- unless your program is actually compiled
to machine-native code for the hardware, in which case the lack of need
for the parser to execute at all at runtime is significant.

-- 
CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]
Brian K. Reid: "In computer science, we stand on each other's feet."