kernprof
[Top] [All Lists]

kernprof performance

To: kernprof@xxxxxxxxxxx
Subject: kernprof performance
From: Mike Kravetz <kravetz@xxxxxxxxxx>
Date: Thu, 13 Jun 2002 11:41:40 -0700
Sender: owner-kernprof@xxxxxxxxxxx
User-agent: Mutt/1.2.5.1i
Recently, I was trying to understand kernprof output while
doing a scalability study.  My intention was to run a benchmark
on a UP (actually 1P SMP kernel) and capture kernprof output.
I then ran the same benchmark after enabling 7 additional
processors.  I 'thought' that I could do an apples to apples
comparison of the output to look for routines whose execution
time increased.  However, this is not what I got.  It 'looks'
like the 8P numbers are 8X the UP numbers???

Anyone know if kernprof can be used for such a comparison?

While looking at the kernprof code, I noticed the global
variable 'total_mcount' (in drivers/char/profile.c)  is
incremented each time mcount is called:  OUCH!!!  Since mcount
is called upon entry to each function, it is hard to imagine
a more contested cache line.  It appears total_mcount is only
used for informational purposes.  I commented out the one
line that increments this variable and did some 'before and
after' runs.  The output is below.  Looks like updating  this
global variable accounts for about 20% of the total time
of running this benchmark with profiling enabled.  I was going
to create a patch to create per-CPU instances of total_mcount
each on a separate cache line.  However, I couldn't find any
other code in this file that appeared to be concerned with
cache line boundaries.  Perhaps there is room for improvement
in this area?

I don't mean to be too down on the kernprof implementation.
I think it is a great tool that I frequently use.  Perhaps
someday I'll get some time to help out with it's development/
support.

Thanks,
-- 
Mike

Original kernprof
-----------------
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  us/call  us/call  name
 35.51   1062.13  1062.13                             cg_record_arc
 12.82   1445.64   383.51 10733690    35.73    35.73  sys_sched_yield
 12.66   1824.38   378.74 13239210    28.61    28.94  schedule
 11.47   2167.45   343.07  2874286   119.36   119.83  send_sig_info
  8.82   2431.30   263.85                             USER
  5.56   2597.67   166.37  2640054    63.02    63.02  _wake_up
  1.34   2637.88    40.21    17832  2254.93  2254.93  poll_idle
  1.12   2671.50    33.62  2874280    11.70    15.20  do_signal
  0.87   2697.61    26.11                             stext_lock
  0.63   2716.55    18.94                             mcount
  0.57   2733.56    17.01  6577379     2.59     3.29  netif_rx
  0.54   2749.66    16.10  4005451     4.02    17.10  tcp_sendmsg
  0.51   2765.06    15.40  8433015     1.83     1.83  mod_timer
  0.49   2779.74    14.68  6577234     2.23    31.01  tcp_v4_rcv
  0.35   2790.16    10.42                             system_call
  0.28   2798.62     8.46  6576949     1.29     7.86  ip_queue_xmit
  0.28   2806.99     8.37  6577234     1.27     5.58  dev_queue_xmit
  0.26   2814.81     7.82  6576949     1.19     9.42  tcp_transmit_skb

kernprof with total_mcount commented out
----------------------------------------
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  us/call  us/call  name
 20.04    476.96   476.96 10134665    47.06    47.06  sys_sched_yield
 18.60    919.57   442.61 12921479    34.25    34.64  schedule
 16.31   1307.74   388.17  2857557   135.84   136.32  send_sig_info
 13.32   1624.77   317.03                             cg_record_arc
 10.86   1883.18   258.41                             USER
  7.59   2063.70   180.52  2605133    69.29    69.29  _wake_up
  1.24   2093.16    29.46    19011  1549.63  1549.63  poll_idle
  0.77   2111.50    18.34                             mcount
  0.68   2127.75    16.25  2857551     5.69     9.08  do_signal
  0.64   2142.90    15.15  4005451     3.78    14.51  tcp_sendmsg
  0.59   2156.92    14.02  8310456     1.69     1.69  mod_timer
  0.58   2170.75    13.83  6387602     2.17    33.20  tcp_v4_rcv
  0.41   2180.50     9.75                             system_call
  0.34   2188.49     7.99  6387193     1.25     5.85  ip_queue_xmit
  0.30   2195.72     7.23  6387602     1.13     3.69  dev_queue_xmit
  0.30   2202.84     7.12 10227829     0.70     1.34  _kfree_skb
  0.30   2209.89     7.05  6387193     1.10     7.37  tcp_transmit_skb
  0.29   2216.89     7.00  6386259     1.10    10.01  tcp_rcv_established


<Prev in Thread] Current Thread [Next in Thread>