From owner-kernprof@oss.sgi.com Thu Jun 13 11:39:54 2002 Received: from oss.sgi.com (localhost [127.0.0.1]) by oss.sgi.com (8.12.3/8.12.3) with ESMTP id g5DIdsnC029626 for ; Thu, 13 Jun 2002 11:39:54 -0700 Received: (from majordomo@localhost) by oss.sgi.com (8.12.3/8.12.3/Submit) id g5DIdsPX029625 for kernprof-outgoing; Thu, 13 Jun 2002 11:39:54 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-kernprof@oss.sgi.com using -f Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.12.3/8.12.3) with SMTP id g5DIdgnC029617 for ; Thu, 13 Jun 2002 11:39:42 -0700 Received: from twestrelay05.boulder.ibm.com (twestrelay05.boulder.ibm.com [9.17.194.56]) by e31.co.us.ibm.com (8.12.2/8.12.2) with ESMTP id g5DIgEcx037216 for ; Thu, 13 Jun 2002 14:42:14 -0400 Received: from w-mikek2.des.beaverton.ibm.com (w-mikek2.des.beaverton.ibm.com [9.47.17.42]) by twestrelay05.boulder.ibm.com (8.11.1m3/NCO/VER6.1) with ESMTP id g5DIgDn89016 for ; Thu, 13 Jun 2002 12:42:14 -0600 Received: (from mkravetz@localhost) by w-mikek2.des.beaverton.ibm.com (8.11.6/8.11.6) id g5DIfeW02402 for kernprof@oss.sgi.com; Thu, 13 Jun 2002 11:41:40 -0700 Date: Thu, 13 Jun 2002 11:41:40 -0700 From: Mike Kravetz To: kernprof@oss.sgi.com Subject: kernprof performance Message-ID: <20020613114140.A2384@w-mikek2.des.beaverton.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i Sender: owner-kernprof@oss.sgi.com Precedence: bulk Recently, I was trying to understand kernprof output while doing a scalability study. My intention was to run a benchmark on a UP (actually 1P SMP kernel) and capture kernprof output. I then ran the same benchmark after enabling 7 additional processors. I 'thought' that I could do an apples to apples comparison of the output to look for routines whose execution time increased. However, this is not what I got. It 'looks' like the 8P numbers are 8X the UP numbers??? Anyone know if kernprof can be used for such a comparison? While looking at the kernprof code, I noticed the global variable 'total_mcount' (in drivers/char/profile.c) is incremented each time mcount is called: OUCH!!! Since mcount is called upon entry to each function, it is hard to imagine a more contested cache line. It appears total_mcount is only used for informational purposes. I commented out the one line that increments this variable and did some 'before and after' runs. The output is below. Looks like updating this global variable accounts for about 20% of the total time of running this benchmark with profiling enabled. I was going to create a patch to create per-CPU instances of total_mcount each on a separate cache line. However, I couldn't find any other code in this file that appeared to be concerned with cache line boundaries. Perhaps there is room for improvement in this area? I don't mean to be too down on the kernprof implementation. I think it is a great tool that I frequently use. Perhaps someday I'll get some time to help out with it's development/ support. Thanks, -- Mike Original kernprof ----------------- Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls us/call us/call name 35.51 1062.13 1062.13 cg_record_arc 12.82 1445.64 383.51 10733690 35.73 35.73 sys_sched_yield 12.66 1824.38 378.74 13239210 28.61 28.94 schedule 11.47 2167.45 343.07 2874286 119.36 119.83 send_sig_info 8.82 2431.30 263.85 USER 5.56 2597.67 166.37 2640054 63.02 63.02 _wake_up 1.34 2637.88 40.21 17832 2254.93 2254.93 poll_idle 1.12 2671.50 33.62 2874280 11.70 15.20 do_signal 0.87 2697.61 26.11 stext_lock 0.63 2716.55 18.94 mcount 0.57 2733.56 17.01 6577379 2.59 3.29 netif_rx 0.54 2749.66 16.10 4005451 4.02 17.10 tcp_sendmsg 0.51 2765.06 15.40 8433015 1.83 1.83 mod_timer 0.49 2779.74 14.68 6577234 2.23 31.01 tcp_v4_rcv 0.35 2790.16 10.42 system_call 0.28 2798.62 8.46 6576949 1.29 7.86 ip_queue_xmit 0.28 2806.99 8.37 6577234 1.27 5.58 dev_queue_xmit 0.26 2814.81 7.82 6576949 1.19 9.42 tcp_transmit_skb kernprof with total_mcount commented out ---------------------------------------- Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls us/call us/call name 20.04 476.96 476.96 10134665 47.06 47.06 sys_sched_yield 18.60 919.57 442.61 12921479 34.25 34.64 schedule 16.31 1307.74 388.17 2857557 135.84 136.32 send_sig_info 13.32 1624.77 317.03 cg_record_arc 10.86 1883.18 258.41 USER 7.59 2063.70 180.52 2605133 69.29 69.29 _wake_up 1.24 2093.16 29.46 19011 1549.63 1549.63 poll_idle 0.77 2111.50 18.34 mcount 0.68 2127.75 16.25 2857551 5.69 9.08 do_signal 0.64 2142.90 15.15 4005451 3.78 14.51 tcp_sendmsg 0.59 2156.92 14.02 8310456 1.69 1.69 mod_timer 0.58 2170.75 13.83 6387602 2.17 33.20 tcp_v4_rcv 0.41 2180.50 9.75 system_call 0.34 2188.49 7.99 6387193 1.25 5.85 ip_queue_xmit 0.30 2195.72 7.23 6387602 1.13 3.69 dev_queue_xmit 0.30 2202.84 7.12 10227829 0.70 1.34 _kfree_skb 0.30 2209.89 7.05 6387193 1.10 7.37 tcp_transmit_skb 0.29 2216.89 7.00 6386259 1.10 10.01 tcp_rcv_established From owner-kernprof@oss.sgi.com Sun Jun 16 23:56:37 2002 Received: from oss.sgi.com (localhost [127.0.0.1]) by oss.sgi.com (8.12.3/8.12.3) with ESMTP id g5H6ubnC012602 for ; Sun, 16 Jun 2002 23:56:37 -0700 Received: (from majordomo@localhost) by oss.sgi.com (8.12.3/8.12.3/Submit) id g5H6uaU6012601 for kernprof-outgoing; Sun, 16 Jun 2002 23:56:36 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-kernprof@oss.sgi.com using -f Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.101]) by oss.sgi.com (8.12.3/8.12.3) with SMTP id g5H6uMnC012598 for ; Sun, 16 Jun 2002 23:56:23 -0700 Received: from northrelay03.pok.ibm.com (northrelay03.pok.ibm.com [9.56.224.151]) by e1.ny.us.ibm.com (8.12.2/8.12.2) with ESMTP id g5H6u9g5171270; Mon, 17 Jun 2002 02:56:09 -0400 Received: from asterix.in.ibm.com (asterix.in.ibm.com [9.182.12.249]) by northrelay03.pok.ibm.com (8.11.1m3/NCO/VER6.1) with ESMTP id g5H6u6L24002; Mon, 17 Jun 2002 02:56:06 -0400 Received: (from kirant@localhost) by asterix.in.ibm.com (8.11.2/8.11.2) id g5H71lL17857; Mon, 17 Jun 2002 12:31:47 +0530 Date: Mon, 17 Jun 2002 12:31:47 +0530 From: Ravikiran G Thirumalai To: kernprof Cc: lse , linux-kernel@vger.kernel.org Subject: [RFC - Patch] Kernprof kernel profiling in NMI mode Message-ID: <20020617123147.F1319@in.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-kernprof@oss.sgi.com Precedence: bulk Here's a patch to enable SGI kernprof kernel profiling in NMI mode (x86 only). This patch sets the PMC overflow apic interrupt delivery mode to NMI, and gathers profile data in the NMI handler. This will enable PMC domain profiling to happen in NMI mode. NMI mode equivalent of time domain profiling can be carried out by doing a PMC domain profiling with CPU_CLKS_UNHALTED event (P6 family). With this patch kernprof can: * Profile routines/code paths which run with on-chip interrupts disabled * Give accurate and indicative 'verbose' output - in the sense events/ticks in interrput disabled code-paths show up at the actual event generating locations (unlike, say all L2 misses/profile ticks showing up after sti) This patch adds an extra call to the profiling path (That can be made inline on the next patch release). Also, a config option is provided to enable NMI based profiling. This patch applies neatly on top of SGI kernprof version 0.12, patch for 2.5.8 profile-0.12.1-2.5.8.patch, and requires no changes to the user level kernprof command tool. Patch has been tested on a 4way PIII Xeon box. Request comments on usefulness, improvements, potential breakages :)..... Kiran diff -ruN -X dontdiff linux-2.5.8/arch/i386/Config.help nmi-kern/arch/i386/Config.help --- linux-2.5.8/arch/i386/Config.help Thu Jun 13 18:23:22 2002 +++ nmi-kern/arch/i386/Config.help Thu Jun 13 19:22:47 2002 @@ -946,6 +946,12 @@ The module will be called profile.o. If you want to compile it as a module, say M here and read Documentation/modules.txt. +CONFIG_NMI_PROFILING + Saying Y here will set the interrupt delivery mode for PMC domain + profiling to NMI. This is useful if you want to profile kernel code + paths which mask interrputs (eg. code between spin_lock_irq and + spin_unlock_irq). + CONFIG_MCOUNT This will instrument the kernel with calls to mcount(), which enables call-graph and call-count profiling. Because mcount() is called at diff -ruN -X dontdiff linux-2.5.8/arch/i386/config.in nmi-kern/arch/i386/config.in --- linux-2.5.8/arch/i386/config.in Thu Jun 13 18:23:22 2002 +++ nmi-kern/arch/i386/config.in Thu Jun 13 19:12:22 2002 @@ -423,6 +423,7 @@ tristate 'Kernel Profiling Support' CONFIG_PROFILING if [ "$CONFIG_PROFILING" != "n" ]; then define_bool CONFIG_FRAME_POINTER y + bool ' Use NMI delivery for PMC events profiling' CONFIG_NMI_PROFILING fi bool 'Instrument kernel at entry to all C functions' CONFIG_MCOUNT if [ "$CONFIG_MCOUNT" = "y" ]; then diff -ruN -X dontdiff linux-2.5.8/arch/i386/kernel/apic.c nmi-kern/arch/i386/kernel/apic.c --- linux-2.5.8/arch/i386/kernel/apic.c Thu Jun 13 18:23:22 2002 +++ nmi-kern/arch/i386/kernel/apic.c Thu Jun 13 19:48:04 2002 @@ -30,6 +30,7 @@ #include #include #include +#include /* Using APIC to generate smp_local_timer_interrupt? */ int using_apic_timer = 0; @@ -971,7 +972,7 @@ void __init setup_APIC_perfctr_vector(void *unused) { (void) apic_read(APIC_LVTPC); - apic_write(APIC_LVTPC, PERFCTR_OVFL_VECTOR); + apic_write(APIC_LVTPC, PERFCTR_OVFL_VECTOR_MODE); } void __init setup_APIC_perfctr(void) diff -ruN -X dontdiff linux-2.5.8/arch/i386/kernel/traps.c nmi-kern/arch/i386/kernel/traps.c --- linux-2.5.8/arch/i386/kernel/traps.c Mon Apr 15 00:48:46 2002 +++ nmi-kern/arch/i386/kernel/traps.c Thu Jun 13 19:03:42 2002 @@ -491,6 +491,10 @@ return; } #endif +#if CONFIG_NMI_PROFILING + smp_apic_perfctr_overflow_interrupt(*regs); + return; +#endif unknown_nmi_error(reason, regs); return; } diff -ruN -X dontdiff linux-2.5.8/include/asm-i386/profile.h nmi-kern/include/asm-i386/profile.h --- linux-2.5.8/include/asm-i386/profile.h Thu Jun 13 18:23:22 2002 +++ nmi-kern/include/asm-i386/profile.h Thu Jun 13 20:18:15 2002 @@ -137,6 +137,12 @@ wrmsr(MSR_P6_EVNTSEL1, 0, 0); wrmsr(MSR_P6_EVNTSEL0, EVENTSEL0_ENABLE_MASK | (evt), 0); } +#ifdef CONFIG_NMI_PROFILING +#define PERFCTR_OVFL_VECTOR_MODE \ + SET_APIC_DELIVERY_MODE(PERFCTR_OVFL_VECTOR, APIC_MODE_NMI) +#else +#define PERFCTR_OVFL_VECTOR_MODE PERFCTR_OVFL_VECTOR +#endif #else #define have_perfctr() 0 #define valid_perfctr_event(e) 0 From owner-kernprof@oss.sgi.com Mon Jun 17 23:22:51 2002 Received: from oss.sgi.com (localhost [127.0.0.1]) by oss.sgi.com (8.12.3/8.12.3) with ESMTP id g5I6MpnC006951 for ; Mon, 17 Jun 2002 23:22:51 -0700 Received: (from majordomo@localhost) by oss.sgi.com (8.12.3/8.12.3/Submit) id g5I6MoOB006950 for kernprof-outgoing; Mon, 17 Jun 2002 23:22:50 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-kernprof@oss.sgi.com using -f Received: from rwcrmhc53.attbi.com (rwcrmhc53.attbi.com [204.127.198.39]) by oss.sgi.com (8.12.3/8.12.3) with SMTP id g5I6MinC006946 for ; Mon, 17 Jun 2002 23:22:44 -0700 Received: from cs.columbia.edu ([12.234.195.52]) by rwcrmhc53.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20020618062531.LAGU11659.rwcrmhc53.attbi.com@cs.columbia.edu>; Tue, 18 Jun 2002 06:25:31 +0000 Message-ID: <3D0ED25B.5030304@cs.columbia.edu> Date: Mon, 17 Jun 2002 23:25:31 -0700 From: Ethan Solomita User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.0.0) Gecko/20020530 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Mike Kravetz CC: kernprof@oss.sgi.com Subject: Re: kernprof performance References: <20020613114140.A2384@w-mikek2.des.beaverton.ibm.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-kernprof@oss.sgi.com Precedence: bulk [resend, take 2] Hi Mike -- there may be some delayed response, because the Usenix Technical Conference was this past week, and John Hawkes was there and busy with a paper to be presented. Mike Kravetz wrote: > Recently, I was trying to understand kernprof output while > doing a scalability study. My intention was to run a benchmark > on a UP (actually 1P SMP kernel) and capture kernprof output. > I then ran the same benchmark after enabling 7 additional > processors. I 'thought' that I could do an apples to apples > comparison of the output to look for routines whose execution > time increased. However, this is not what I got. It 'looks' > like the 8P numbers are 8X the UP numbers??? > I'm not sure what you're referring to, but I have a guess. If you have 1 CPU, and HZ is 100 (which it is), then you will get 100 samples per second from kernprof. If you have 2 CPUs, then you will get 200 samples per second. I believe that the output from gprof will show a total of 200% (my memory may be bad). If so, that's not a problem. Each CPU represents 100%. This seems much more useful than the reverse. For example, if 1 cpu is 100% busy, and the other is 100% idle, the gprof output will show 100% (well, 99% 8) in cpu_idle(), and another 100% spread around among various functions. This seems much more readable than 50% in cpu_idle and 50% spread around. Now imagine a 32 CPU system and how readable it would be... > While looking at the kernprof code, I noticed the global > variable 'total_mcount' (in drivers/char/profile.c) is Personally, I don't use acg mode. For most purposes, I prefer call backtrace mode. You don't need to build the kernel with CONFIG_MCOUNT at all. So you can enable CONFIG_KERNPROF (very minimal overhead) all the time, and still get profiling info whenever you need it without rebooting. The output in gprof that says how many times a function was called, and how many times that function called others, is different. But no less useful. And the times it attributes to callers and callees of a function are actually *accurate* with call backtrace, whereas acg can be completely inaccurate. -- Ethan From owner-kernprof@oss.sgi.com Tue Jun 18 08:17:05 2002 Received: from oss.sgi.com (localhost [127.0.0.1]) by oss.sgi.com (8.12.3/8.12.3) with ESMTP id g5IFH5nC017342 for ; Tue, 18 Jun 2002 08:17:05 -0700 Received: (from majordomo@localhost) by oss.sgi.com (8.12.3/8.12.3/Submit) id g5IFH5FJ017341 for kernprof-outgoing; Tue, 18 Jun 2002 08:17:05 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-kernprof@oss.sgi.com using -f Received: from sgi.com (sgi-too.SGI.COM [204.94.211.39]) by oss.sgi.com (8.12.3/8.12.3) with SMTP id g5IFH0nC017337 for ; Tue, 18 Jun 2002 08:17:00 -0700 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id IAA05447 for ; Tue, 18 Jun 2002 08:19:47 -0700 (PDT) mail_from (hawkes@sgi.com) Received: from wrlarun (sshgate.corp.sgi.com [169.238.216.146]) by cthulhu.engr.sgi.com (SGI-8.9.3/8.9.3) with SMTP id IAA50203; Tue, 18 Jun 2002 08:15:19 -0700 (PDT) Message-ID: <000a01c216da$a6f6f800$6601a8c0@attbi.com> From: "John Hawkes" To: "Mike Kravetz" , References: <20020613114140.A2384@w-mikek2.des.beaverton.ibm.com> Subject: Re: kernprof performance Date: Tue, 18 Jun 2002 08:13:07 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2919.6600 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6600 Sender: owner-kernprof@oss.sgi.com Precedence: bulk From: "Mike Kravetz" ... > While looking at the kernprof code, I noticed the global > variable 'total_mcount' (in drivers/char/profile.c) is > incremented each time mcount is called: OUCH!!! Since mcount > is called upon entry to each function, it is hard to imagine > a more contested cache line. ... Thanks for the suggestion. Yes, you're right, I haven't spent much time making kernprof more efficient. I've made some changes in this area to move total_mcount and some other per-cpu counters into isolated cachelines. I'll test them and release the new drivers/char/kernprof.c this week. John Hawkes