[resend, take 2]
Hi Mike -- there may be some delayed response, because the Usenix
Technical Conference was this past week, and John Hawkes was there and
busy with a paper to be presented.
Mike Kravetz wrote:
> Recently, I was trying to understand kernprof output while
> doing a scalability study. My intention was to run a benchmark
> on a UP (actually 1P SMP kernel) and capture kernprof output.
> I then ran the same benchmark after enabling 7 additional
> processors. I 'thought' that I could do an apples to apples
> comparison of the output to look for routines whose execution
> time increased. However, this is not what I got. It 'looks'
> like the 8P numbers are 8X the UP numbers???
>
I'm not sure what you're referring to, but I have a guess. If you have 1
CPU, and HZ is 100 (which it is), then you will get 100 samples per
second from kernprof. If you have 2 CPUs, then you will get 200 samples
per second. I believe that the output from gprof will show a total of
200% (my memory may be bad). If so, that's not a problem. Each CPU
represents 100%.
This seems much more useful than the reverse. For example, if 1 cpu is
100% busy, and the other is 100% idle, the gprof output will show 100%
(well, 99% 8) in cpu_idle(), and another 100% spread around among
various functions. This seems much more readable than 50% in cpu_idle
and 50% spread around. Now imagine a 32 CPU system and how readable it
would be...
> While looking at the kernprof code, I noticed the global
> variable 'total_mcount' (in drivers/char/profile.c) is
Personally, I don't use acg mode. For most purposes, I prefer call
backtrace mode. You don't need to build the kernel with CONFIG_MCOUNT at
all. So you can enable CONFIG_KERNPROF (very minimal overhead) all the
time, and still get profiling info whenever you need it without rebooting.
The output in gprof that says how many times a function was called, and
how many times that function called others, is different. But no less
useful. And the times it attributes to callers and callees of a function
are actually *accurate* with call backtrace, whereas acg can be
completely inaccurate.
-- Ethan
|