pcp
[Top] [All Lists]

Re: [performancecopilot/pcp] pmcd causes complete system lockup on CentO

To: performancecopilot/pcp <pcp@xxxxxxxxxxxxxxxxxx>
Subject: Re: [performancecopilot/pcp] pmcd causes complete system lockup on CentOS 7 on VMware (#107)
From: Ken McDonell <notifications@xxxxxxxxxx>
Date: Tue, 16 Aug 2016 14:08:43 -0700
Cc: pcpemail <pcp@xxxxxxxxxxx>, Comment <comment@xxxxxxxxxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha1; c=relaxed; d=github.com; h=from:reply-to:to:cc:in-reply-to:references:subject:mime-version:content-type:content-transfer-encoding:list-id:list-archive:list-post:list-unsubscribe; s=s20150108; bh=/XYtqkqSOklLoLv0IlnQhGemnfc=; b=nOHJdjWZogXnXUPz W7hIn7zNMaP8ACYWZ9WykY47G0T5I/94R+1dQQfSI7178keYEwBR1f0w+amHJXFW SBdtpPYwIO1baFRKAkQICu9lTs62roqU9vlIneJzpo+Z/IUjTR4h6ZeJOmVjzjpB CtnCvTthc7/KOGP/BNcT3ON12/o=
In-reply-to: <performancecopilot/pcp/issues/107@xxxxxxxxxx>
List-archive: https://github.com/performancecopilot/pcp
List-id: performancecopilot/pcp <pcp.performancecopilot.github.com>
List-post: <mailto:reply+00bd08b65447cd9ca8b0550f4b312c6e546648bf11a7f67d92cf0000000113cb42db92a169ce0a350886@reply.github.com>
List-unsubscribe: <mailto:unsub+00bd08b65447cd9ca8b0550f4b312c6e546648bf11a7f67d92cf0000000113cb42db92a169ce0a350886@reply.github.com>, <https://github.com/notifications/unsubscribe/AL0Itu-Zf-XuLg7h43cSSv3q65mgiWXsks5qgibbgaJpZM4JktgK>
References: <performancecopilot/pcp/issues/107@xxxxxxxxxx>
Reply-to: performancecopilot/pcp <reply+00bd08b65447cd9ca8b0550f4b312c6e546648bf11a7f67d92cf0000000113cb42db92a169ce0a350886@xxxxxxxxxxxxxxxx>

Screencast suggests hang is about 20secs after pmcd start, which is interesting and suggests it is NOT an initialization error, but possibly pmFetch related or some self-timer driven event in a PMDA.

Was pmlogger enabled on this system?

Another possible approach is trying to find the PMDA that is responsible (it is unlikely to be pmcd itself). You have 11 PMDAs in /etc/pcp/pmcd/pmcd.conf ... I'd start by commenting about half of them out (insert a # at the start of the line) especially the ones with low-level hardware contact or deep kernel contact, e.g. perfevent, jbd2, nvidia, slurm, xfs, linux, proc. Then try again.

If this survives, you may be able to binary-chop your way to identifying which PMDA is the culprit.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

<Prev in Thread] Current Thread [Next in Thread>