| To: | "performancecopilot/pcp" <reply+00bd08b65447cd9ca8b0550f4b312c6e546648bf11a7f67d92cf0000000113cb42db92a169ce0a350886@xxxxxxxxxxxxxxxx> |
|---|---|
| Subject: | Re: [pcp] [performancecopilot/pcp] pmcd causes complete system lockup on CentOS 7 on VMware (#107) |
| From: | Mark Goodwin <mgoodwin@xxxxxxxxxx> |
| Date: | Wed, 17 Aug 2016 08:00:45 +1000 |
| Cc: | "performancecopilot/pcp" <pcp@xxxxxxxxxxxxxxxxxx>, Comment <comment@xxxxxxxxxxxxxxxxxx>, pcpemail <pcp@xxxxxxxxxxx> |
| Delivered-to: | pcp@xxxxxxxxxxx |
| In-reply-to: | <performancecopilot/pcp/issues/107/240239580@xxxxxxxxxx> |
| References: | <performancecopilot/pcp/issues/107@xxxxxxxxxx> <performancecopilot/pcp/issues/107/240239580@xxxxxxxxxx> |
|
The vmcore shows It's the perfevent PMDA - this is a VMware guest and there are known kernel issues withÂx86_perf_event_update() callingÂnative_read_pmc() .. but VMware doesn't (apparently) implement all the h/w events. Your actual crash was triggered by 'salt-minion', which is also tripping up inÂnative_read_pmc(). I guess h/w perf events are probably not much use in a virtual machine, so either manually comment out 'perfevent' in pmcd.conf, or run /var/lib/pcp/pmdas/perfevent/Remove. You may also have to turn off 'salt-minion'. For PCP, the perf event PMDA should probably detect it's in a guest and not run unless forced or something. Not sure of a programmatic way to determine that, but there will be some way for sure. This particular issue has been reported before, see BZ 1178606 - 'general protection fault in native_read_pmc while running perf on VMware guest', which was posted against RHEL6. [ Â134.800273] general protection fault: 0000 [#1] SMPÂ [ Â134.800304] Modules linked in: ext4 mbcache jbd2 rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlx4_en(OE) vxlan ip6_udp_tunnel udp_tunnel ptp pps_core mlx4_ib(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) ib_addr(OE) mlx4_core(OE) mlx_compat(OE) coretemp ppdev sg vmw_balloon pcspkr shpchp parport_pc i2c_piix4 parport vmw_vmci nfsd knem(OE) auth_rpcgss ip_tables nfsv3 nfs_acl nfs lockd grace fscache sd_mod crc_t10dif crct10dif_generic sr_mod cdrom ata_generic pata_acpi crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel vmwgfx aesni_intel lrw gf128mul glue_helper ablk_helper cryptd serio_raw drm_kms_helper ttm vmxnet3 ahci vmw_pvscsi libahci drm ata_piix libata i2c_core floppy sunrpc dm_mirror dm_region_hash [ Â134.800645] Âdm_log dm_mod [ Â134.800656] CPU: 0 PID: 2934 Comm: salt-minion Tainted: G Â Â Â Â Â OE Â------------ Â 3.10.0-327.13.1.el7.x86_64 #1 [ Â134.800691] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/17/2015 [ Â134.800735] task: ffff880073299700 ti: ffff88007977c000 task.ti: ffff88007977c000 [ Â134.800762] RIP: 0010:[<ffffffff81058d66>] Â[<ffffffff81058d66>] native_read_pmc+0x6/0x20 [ Â134.800796] RSP: 0000:ffff88007ce03ef0 ÂEFLAGS: 00010083 [ Â134.800814] RAX: ffffffff81957ee0 RBX: 0000000000000000 RCX: 0000000040000002 [ Â134.800838] RDX: 0000000051c31ddb RSI: ffff88007ce17fa8 RDI: 0000000040000002 [ Â134.800863] RBP: ffff88007ce03ef0 R08: 000000000000001b R09: 00007fff2dc71714 [ Â134.800887] R10: 0000000000000001 R11: 00007ff043fd9c40 R12: ffffffff80000001 [ Â134.800910] R13: ffff880077763400 R14: ffff880077763578 R15: 0000000000000010 [ Â134.800934] FS: Â00007ff045117740(0000) GS:ffff88007ce00000(0000) knlGS:0000000000000000 [ Â134.800960] CS: Â0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ Â134.800979] CR2: 0000000001342220 CR3: 0000000077c78000 CR4: 00000000001407f0 [ Â134.801030] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ Â134.801082] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ Â134.801106] Stack: [ Â134.801115] Âffff88007ce03f28 ffffffff81029e03 0000000000000000 ffff880077763400 [ Â134.801146] Âffff88007ce17fb4 00007ff044ee9540 00007ff044ee9710 ffff88007ce03f38 [ Â134.801175] Âffffffff8102a079 ffff88007ce03f60 ffffffff811591fe ffff88007323fd90 [ Â134.801205] Call Trace: [ Â134.801215] Â<IRQ>Â [ Â134.801223]Â [ Â134.801234] Â[<ffffffff81029e03>] x86_perf_event_update+0x43/0x90 [ Â134.801252] Â[<ffffffff8102a079>] x86_pmu_read+0x9/0x10 [ Â134.801272] Â[<ffffffff811591fe>] __perf_event_read+0xfe/0x110 [ Â134.801294] Â[<ffffffff810e6b3d>] flush_smp_call_function_queue+0x5d/0x130 [ Â134.801318] Â[<ffffffff810e7213>] generic_smp_call_function_single_interrupt+0x13/0x30 [ Â134.801345] Â[<ffffffff81046c77>] smp_call_function_single_interrupt+0x27/0x40 [ Â134.801371] Â[<ffffffff81646f9d>] call_function_single_interrupt+0x6d/0x80 [ Â134.801393] Â<EOI>Â [ Â134.801401] Code:Â [ Â134.801411] c0 48 c1 e2 20 89 0e 48 09 c2 48 89 d0 5d c3 66 0f 1f 44 00 00 55 89 f0 89 f9 48 89 e5 0f 30 31 c0 5d c3 66 90 55 89 f9 48 89 e5 <0f> 33 89 c0 48 c1 e2 20 48 09 c2 48 89 d0 5d c3 66 2e 0f 1f 84Â [ Â134.801600] RIP Â[<ffffffff81058d66>] native_read_pmc+0x6/0x20 [ Â134.801633] ÂRSP <ffff88007ce03ef0> Interestingly, both the perfevent PMDA and salt-minion were running when the crash occurred, and both were reading a perf event : crash> ps | grep '^>' > Â2934 Â Â Â1 Â 0 Âffff880073299700 ÂRU Â 3.3 Â712948 Â69680 Âsalt-minion > Â4958 Â 4950 Â 1 Âffff880073355c00 ÂRU Â 0.2 Â 76468 Â 3412 Âpmdaperfevent crash> bt 4958 PID: 4958 Â TASK: ffff880073355c00 ÂCPU: 1 Â COMMAND: "pmdaperfevent" Â#0 [ffff88007cf05e70] crash_nmi_callback at ffffffff810458f2 Â#1 [ffff88007cf05e80] nmi_handle at ffffffff8163e8d9 Â#2 [ffff88007cf05ec8] do_nmi at ffffffff8163e9f0 Â#3 [ffff88007cf05ef0] nmi_restore at ffffffff8163dd13 Â Â [exception RIP: generic_exec_single+314] Â Â RIP: ffffffff810e687a ÂRSP: ffff88007323fd90 ÂRFLAGS: 00000202 Â Â RAX: 00000000000008fb ÂRBX: ffff88007323fd90 ÂRCX: 0000000000000000 Â Â RDX: 00000000000008fb ÂRSI: 00000000000000fb ÂRDI: 0000000000000286 Â Â RBP: ffff88007323fdd8 Â R8: 0000000000000001 Â R9: 0000000000000000 Â Â R10: 0000000000000000 ÂR11: 0000000000000293 ÂR12: 0000000000000000 Â Â R13: 0000000000000001 ÂR14: ffff880077763400 ÂR15: ffff88007323fea0 Â Â ORIG_RAX: ffffffffffffffff ÂCS: 0010 ÂSS: 0018 --- <NMI exception stack> --- Â#4 [ffff88007323fd90] generic_exec_single at ffffffff810e687a Â#5 [ffff88007323fde0] smp_call_function_single at ffffffff810e697f Â#6 [ffff88007323fe10] perf_event_read_value at ffffffff811584e2 Â#7 [ffff88007323fe40] perf_event_read_value at ffffffff81158533 Â#8 [ffff88007323fe80] perf_read at ffffffff81158cf0 Â#9 [ffff88007323ff08] vfs_read at ffffffff811de4ec #10 [ffff88007323ff38] sys_write at ffffffff811df03f #11 [ffff88007323ff80] sysret_check at ffffffff81645ec9 Â Â RIP: 00007f72e884222d ÂRSP: 00007fff33242fd8 ÂRFLAGS: 00010206 Â Â RAX: 0000000000000000 ÂRBX: ffffffff81645ec9 ÂRCX: 0000000000000001 Â Â RDX: 0000000000000018 ÂRSI: 0000000000789250 ÂRDI: 0000000000000006 Â Â RBP: 0000000000000000 Â R8: 0000000000000000 Â R9: 0000000051c2fbf5 Â Â R10: 0000000000000000 ÂR11: 0000000000000293 ÂR12: 0000000000789b48 Â Â R13: 0000000000000000 ÂR14: 0000000000789568 ÂR15: 0000000000789250 Â Â ORIG_RAX: 0000000000000000 ÂCS: 0033 ÂSS: 002b On Wed, Aug 17, 2016 at 7:08 AM, Ken McDonell <notifications@xxxxxxxxxx> wrote:
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | Re: [pcp] Debugging sigpipe in pmda, Jeff Hanson |
|---|---|
| Next by Date: | Re: [performancecopilot/pcp] pmcd causes complete system lockup on CentOS 7 on VMware (#107), Jeff White |
| Previous by Thread: | Re: [performancecopilot/pcp] pmcd causes complete system lockup on CentOS 7 on VMware (#107), Ken McDonell |
| Next by Thread: | Re: [performancecopilot/pcp] pmcd causes complete system lockup on CentOS 7 on VMware (#107), Jeff White |
| Indexes: | [Date] [Thread] [Top] [All Lists] |