|
|
||||
|
What is kernel profiling? What is kernel profiling? Profiling refers to the collection of data during the execution of a program that can be analyzed to study the performance of the program, identify hot spots, etc. Kernel profiling does this for an OS kernel, in this case for Linux. We provide a kernel component as well as a user level application, kernprof, that together support a variety of profiling modes and domains.
What is a profiling mode?
There are many different types of data one can collect to produce a profile.
We use the term profiling mode to refer to a particular data collection
strategy. Different modes have different degrees of overhead on
execution time and
provide different types of information. Kernprof supports six modes:
PC sampling collects periodic PC samples and is the most
lightweight of the supported modes. kernprof translates the
numeric Program Counter addresses into kernel procedure names and shows
you how often each procedure was executing when one of those period PC
sampling events occurred.
The larger the count, the bigger the CPU hog.
Call graph mode records information at every function call so it can
construct a so-called call graph, which identifies the callers and the
callees of each function invoked while profiling was on. Recording this
information can add significant overhead (10-15% of execution time is not
uncommon) and requires the kernel to be compiled with special options (frame
pointers and calls to mcount()).
An annotated call graph is a combination of the above two. Its
overhead is just slightly more than that of call graph,
and it has the same compilation requirements.
On the other hand, it provides richer information
than PC sampling, allowing one to identify expensive paths
in the code rather than just expensive points.
Scheduler call graph mode records information every time the kernel's
schedule() routine does a context switch.
This helps the analyst understand why events transpired to cause context
switches.
Call count mode records the number of times each function was called
while profiling was on. It is less informative than a call graph,
but more lightweight.
This mode relies on mcount(), but, strictly speaking, does not
require frame pointers.
Unfortunately, gcc (for i386) does not permit the use of mcount()
without frame pointers, so currently this mode is more expensive than it could
be.
Call backtracing extends PC sampling to record an entire call
chain rather than just a PC value. As a result it allows the identification of
expensive paths rather than just expensive points. It requires frame
pointers (but not mcount()),
and its overhead is between that of PC sampling and call graph.
This mode is particularly useful on NUMA machines because it
does not use shared memory.
What is a profiling domain?
A domain specifies what event triggers a sample to be recorded in modes
that use sampling (i.e., all except call graph).
Presently there are two modes:
In the time domain samples are recorded during timer interrupts,
ergo they are taken at fixed time intervals.
In the PMC domain we use a CPU's performance monitoring counters to
record samples every time a specified number of occurences of a selected
event have occured. Not all CPU architectures support this domain.
I've applied the kernel patch. How do I get a profile?
First, you need to create the character special device /dev/profile:
You need to create the control device /dev/profile
(major 192, minor 0), as described above.
If you want to use call backtrace profiling, you also need to create one device
for each CPU on your system.
CPUi needs /dev/profilei (major 192, minor i+1).
For example,
kernprof is mainly used to control the kernel's profiling facilities,
such as choosing the profiling mode and frequency,
and turning profiling on and off.
kernprof can also produce profiles from PC sampling
and Call counts.
gprof is responsible for producing
profiles for the more complex modes.
In this case kernprof is used to set up the profiling parameters and
then to generate the data files that gprof requires as input.
Therefore, unless you're doing only PC sampling or Call counts,
you need both kernprof and gprof.
What are the 'USER' and 'UNKNOWN KERNEL' symbols that appear in profiles?
Any PC samples recorded in user space are attributed to USER;
and any samples recorded while in kernel mode,
but in a function that does not appear in the static kernel binary (vmlinux),
are attributed to UNKNOWN KERNEL.
Presently, such functions belong to dynamically loaded modules.
How reliable are the sampling-based profiling modes?
Profiling modes that employ statistical sampling rely on interrupts and
therefore miss functions that execute with interrupts disabled, and
also miss functions that execute synchronously because of that
particular triggering event (e.g., if the timer tick that captures the PC
also produces kernel timeouts that are serviced before the next PC
sampling event occurs).
Why do I need to patch gcc?
Stock gcc, through at least version 2.95.3,
has a bug whereby programs that use regparm to pass function
arguments in registers are miscompiled when the compiler switch -pg
is supplied.
Linux uses regparm,
and Kernprof needs the kernel to compile with -pg in order to get
mcount()-based profiling support.
Without the patch, stock gcc
produces a kernel that will crash or hang early in the boot sequence.
The patch supplied here hacks gcc so it works with regparm and -pg.
You can
download gcc source from the GNU Project.
The most obvious reason why the kernprof command gets a
/dev/profile: No such file or directory
error is that there really is no /dev/profile node.
First, ensure that the node exists. If it doesn't, you must create it:
mknod /dev/profile c 192 0
If kernprof (and the kernel) works for "-t pc" mode, but for "-t cg",
"-t scg", and various other modes you see the error
kernprof: kernel does not support call graph mode,
then you have probably built the kernel with CONFIG_MCOUNT turned off.
Turn on this option (in the "Kernel hacks" subsection).
However, for i386 kernels you must apply the gcc patch
prior to building the kernel with CONFIG_MCOUNT turned on.
|
||||