kernprof
[Top] [All Lists]

Re: kernprof performance

To: Mike Kravetz <kravetz@xxxxxxxxxx>
Subject: Re: kernprof performance
From: Ethan Solomita <ethan@xxxxxxxxxxxxxxx>
Date: Mon, 17 Jun 2002 23:25:31 -0700
Cc: kernprof@xxxxxxxxxxx
References: <20020613114140.A2384@w-mikek2.des.beaverton.ibm.com>
Sender: owner-kernprof@xxxxxxxxxxx
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.0.0) Gecko/20020530
[resend, take 2]

Hi Mike -- there may be some delayed response, because the Usenix Technical Conference was this past week, and John Hawkes was there and busy with a paper to be presented.

Mike Kravetz wrote:
> Recently, I was trying to understand kernprof output while
> doing a scalability study. My intention was to run a benchmark
> on a UP (actually 1P SMP kernel) and capture kernprof output.
> I then ran the same benchmark after enabling 7 additional
> processors. I 'thought' that I could do an apples to apples
> comparison of the output to look for routines whose execution
> time increased. However, this is not what I got. It 'looks'
> like the 8P numbers are 8X the UP numbers???
>
I'm not sure what you're referring to, but I have a guess. If you have 1 CPU, and HZ is 100 (which it is), then you will get 100 samples per second from kernprof. If you have 2 CPUs, then you will get 200 samples per second. I believe that the output from gprof will show a total of 200% (my memory may be bad). If so, that's not a problem. Each CPU represents 100%.


This seems much more useful than the reverse. For example, if 1 cpu is 100% busy, and the other is 100% idle, the gprof output will show 100% (well, 99% 8) in cpu_idle(), and another 100% spread around among various functions. This seems much more readable than 50% in cpu_idle and 50% spread around. Now imagine a 32 CPU system and how readable it would be...


> While looking at the kernprof code, I noticed the global > variable 'total_mcount' (in drivers/char/profile.c) is

Personally, I don't use acg mode. For most purposes, I prefer call backtrace mode. You don't need to build the kernel with CONFIG_MCOUNT at all. So you can enable CONFIG_KERNPROF (very minimal overhead) all the time, and still get profiling info whenever you need it without rebooting.

The output in gprof that says how many times a function was called, and how many times that function called others, is different. But no less useful. And the times it attributes to callers and callees of a function are actually *accurate* with call backtrace, whereas acg can be completely inaccurate.
-- Ethan




<Prev in Thread] Current Thread [Next in Thread>