I'm not jumping back into this in a big way, since it's obvious
Ruby-on-Parrot isn't intriguing to enough people to make it worth the
attention.

But on this point of register vs. stack machines. Most of the people
writing to the VM are language implementors, not ordinary users, and
they have a lot of sophisticated things to learn anyway. Most real
hardware supports the register-file model either natively, in
microcode, or with opcodes. I'm not convinced that the stack-based
model necessarily gives advantages over the register-based, and my
intuition says that other factors in the machine architecture are more
important.

Most of the recent work in VM optimization has taken place in the JVM
context, and involves extreme cleverness in regard to what code should
be compiled or recompiled to native opcodes, and when. But look at the
older literature, from before the release of Java, and you'll find a
lot of stuff that is probably more germane to dynamic-language
processing (much of it done in the Smalltalk context, but still
generic): things like detecting and hard-compiling code that is not
likely to be metaprogrammed, and tuning these code paths heuristically
during the run.

My bottom line is this: I don't care about Perl, not even a little
tiny bit. But I'm not convinced that we've asked enough questions
about the ideal VM characteristics for supporting highly-dynamic
languages. I know little about Rubinius and have not conversed with
Evan. I'm quite impressed with Charles' work to date on JRuby, but the
JVM may not be the ideal platform for running Ruby. I don't think
we've heard the last word.

And Ruby is in a distinct class compared to other dynamic languages
because of the unusual degree to which it encourages and benefits from
metaprogramming. No, that's not to say other languages can't do
similar things. It is to say that this is part of "the Ruby way," and
I find Ruby handicapped by the huge performance penalty that
metaprogramming often imposes.

Concurrency primitives: at the end of the day, there is only one
primitive that is really required, and it MUST be supplied by the
hardware, not the VM. That's the atomic check-and-set operation, of
course. Threading packages for C can be and often are supplied as
pure-userland libraries.