M. Edward (Ed) Borasky wrote:
>> The byte-codes also go in
>> d-cache while the interpreter itself is in I-cache.
> You need a very carefully designed inner interpreter for this to be 
> useful.

Good stuff, Ed, but not really what I meant.
They're modifying direct-threaded code to
aggregate common sequences of functions AIUI,
where I wasn't really talking about threaded
code at all, but byte-code. I've used aggressive
inlining to build an interpreter with nearly all
the primitives in one function, leaving normal
C register variables available as registers, and
found that worked quite well (for emulating a
small microprocessor on a 386, rather than for
byte code). The interesting thing is what a good
compiler can do with such a large function if
it's built this way. You can avoid most call
overhead and have a compact switch table if you
have a well-designed byte-code. Even if the byte
code is highly dense, so that each code needs to
be looked at several times to be executed, that
isn't a problem once it's in cache, as the very
next thing you're often going to do is to fetch
more data or byte-code, and you'll have to wait
for that - so using some of those CPU cycles
decoding the byte-code doesn't hurt much.

Clifford Heath.