Lionel Bouton wrote:
> These are CPU-level profiling tools, as:
> - the CPU is the same in and out of UML,
> - I certainly don't have access to the performance counters from 
> user-mode-linux,
> they won't be of much use for me (profiling the behavior of 
> user-mode-linux is not what I'm after).

I disagree here.

1. User-mode Linux is a guest in some host. (Gentoo, wasn't it?) In a 
sense, UML is the application, even though it's executing code on behalf 
of the benchmark, which is the application you care about. So profiling 
the host with *oprofile* will tell you what the whole host is doing, 
including the UML guest and the benchmark within it.

2. The actual physical processor(s) will be the same in and out of UML, 
yes. However, what the "OS" does with those processors, especially with 
respect to caches, will be different. There are other things that could 
affect this, like branch prediction, or the system call possibility you 
noted before. oprofile and the CodeAnalyst wrapper will tell you how 
efficiently the processor is being used in both cases.

> Eventually when I have time to narrow down the problem myself, I'll 
> launch strace on the benchmark, study the differences and submit the 
> list of system calls to the UML coders asking why some can be faster on 
> UML than on the host kernel.
> 
> Lionel
> 
> 

As long as you're experimenting, you might want to try this with a Xen 
domu host and dom0 guest. Or a VMware Server. In either case, oprofile 
on the host should give you some interesting information.