M. Edward (Ed) Borasky wrote: >> The byte-codes also go in >> d-cache while the interpreter itself is in I-cache. > You need a very carefully designed inner interpreter for this to be > useful. Good stuff, Ed, but not really what I meant. They're modifying direct-threaded code to aggregate common sequences of functions AIUI, where I wasn't really talking about threaded code at all, but byte-code. I've used aggressive inlining to build an interpreter with nearly all the primitives in one function, leaving normal C register variables available as registers, and found that worked quite well (for emulating a small microprocessor on a 386, rather than for byte code). The interesting thing is what a good compiler can do with such a large function if it's built this way. You can avoid most call overhead and have a compact switch table if you have a well-designed byte-code. Even if the byte code is highly dense, so that each code needs to be looked at several times to be executed, that isn't a problem once it's in cache, as the very next thing you're often going to do is to fetch more data or byte-code, and you'll have to wait for that - so using some of those CPU cycles decoding the byte-code doesn't hurt much. Clifford Heath.