pcp
[Top] [All Lists]

Re: [pcp] pmlogger_check stuck if host is down

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: [pcp] pmlogger_check stuck if host is down
From: Rares Vernica <rvernica@xxxxxxxxx>
Date: Mon, 13 Mar 2017 13:29:15 -0700
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=e/mBPb+rIGbLUd1FjIxeCRPnhxf4klXifK1GQXxArrA=; b=ELGqOuc+UeiWfrVtjC5eBHLjCur59lPbeSr+0n9/UP8uf9TIswD/8iG+i3kPc555JM ifg7kexFlQAozfWo7p+Za7M6CW+CSDWg60sUOWw7dOSGgFLZXXS4GnxN9OOGwzrzwZEm RkHhTWp+vKy18RV+rHfn2L2lPEV5HO9VQkDUfYNIUcgBOX1JpTq3uvGXDHExYbJ7IZ8w lYFq0HL2jScut+FngF2Lt0ocywtyj5M8leZBnJ60C0xZb41khK1Q9w7svcIOWiCih/w5 KmJwF0sgH1BEv8SenoWNk63AmWliuoE3d41645SDYtQV9DTcMkxoJVFcP+NZp5g2XUBN mbdA==
In-reply-to: <1798358557.43980562.1461899036387.JavaMail.zimbra@xxxxxxxxxx>
References: <CALQ9KxCa75FNi0RY7rfSrQjJh=L33mPQWZpQpgGy2quPE+cimQ@xxxxxxxxxxxxxx> <1798358557.43980562.1461899036387.JavaMail.zimbra@xxxxxxxxxx>
Hi Nathan,

Thanks for your pointers. I looked more into this.

On Thu, Apr 28, 2016 at 8:03 PM, Nathan Scott <nathans@xxxxxxxxxx> wrote:
> ----- Original Message -----
> > [...]
> > If one of the remote hosts is down, pmlogger_check gets stuck on that host
> > and takes about 30 min to move on. I ran pmlogger_check with -VV and the
> > output looks like:
> >
> > [...]
> > > ps ax | grep pml
>
> (any pmprobe processes running OOC? Âthat grep would have excluded 'em, but
> I wonder if thats where the blockage is)

While pmllogger_check was being stuck on the host which is down, I run ps to check for pmprobe. I run ps every 3-4 seconds and here is the output:

# ps ax | grep bb-02
19974 pts/0ÂÂÂ S+ÂÂÂÂ 0:00 /bin/sh /usr/libexec/pcp/bin/pmlogconf -r -c -q -h bb-02 /tmp/pcp.SH9Zh8Psb/pmlogger
26891 pts/0ÂÂÂ S+ÂÂÂÂ 0:00 /bin/sh /usr/libexec/pcp/bin/pmlogconf-setup -h bb-02 /var/lib/pcp/config/pmlogconf/sgi/numa
26903 pts/0ÂÂÂ S+ÂÂÂÂ 0:00 /bin/sh /usr/libexec/pcp/bin/pmlogconf-setup -h bb-02 /var/lib/pcp/config/pmlogconf/sgi/numa
26906 pts/0ÂÂÂ S+ÂÂÂÂ 0:00 pmprobe -h bb-02 -v origin.numa.routerload
# ps ax | grep bb-02
19974 pts/0ÂÂÂ S+ÂÂÂÂ 0:00 /bin/sh /usr/libexec/pcp/bin/pmlogconf -r -c -q -h bb-02 /tmp/pcp.SH9Zh8Psb/pmlogger
26921 pts/0ÂÂÂ S+ÂÂÂÂ 0:00 /bin/sh /usr/libexec/pcp/bin/pmlogconf-setup -h bb-02 /var/lib/pcp/config/pmlogconf/sgi/numa-summary
26933 pts/0ÂÂÂ S+ÂÂÂÂ 0:00 /bin/sh /usr/libexec/pcp/bin/pmlogconf-setup -h bb-02 /var/lib/pcp/config/pmlogconf/sgi/numa-summary
26936 pts/0ÂÂÂ S+ÂÂÂÂ 0:00 pmprobe -h bb-02 -v origin.numa.migr.intr.total
# ps ax | grep bb-02
19974 pts/0ÂÂÂ S+ÂÂÂÂ 0:00 /bin/sh /usr/libexec/pcp/bin/pmlogconf -r -c -q -h bb-02 /tmp/pcp.SH9Zh8Psb/pmlogger
26951 pts/0ÂÂÂ S+ÂÂÂÂ 0:00 /bin/sh /usr/libexec/pcp/bin/pmlogconf-setup -h bb-02 /var/lib/pcp/config/pmlogconf/sgi/xbow
26963 pts/0ÂÂÂ S+ÂÂÂÂ 0:00 /bin/sh /usr/libexec/pcp/bin/pmlogconf-setup -h bb-02 /var/lib/pcp/config/pmlogconf/sgi/xbow
26966 pts/0ÂÂÂ S+ÂÂÂÂ 0:00 pmprobe -h bb-02 -v xbow.nports
# ps ax | grep bb-02
19974 pts/0ÂÂÂ S+ÂÂÂÂ 0:00 /bin/sh /usr/libexec/pcp/bin/pmlogconf -r -c -q -h bb-02 /tmp/pcp.SH9Zh8Psb/pmlogger
26951 pts/0ÂÂÂ S+ÂÂÂÂ 0:00 /bin/sh /usr/libexec/pcp/bin/pmlogconf-setup -h bb-02 /var/lib/pcp/config/pmlogconf/sgi/xbow
26963 pts/0ÂÂÂ S+ÂÂÂÂ 0:00 /bin/sh /usr/libexec/pcp/bin/pmlogconf-setup -h bb-02 /var/lib/pcp/config/pmlogconf/sgi/xbow
26966 pts/0ÂÂÂ S+ÂÂÂÂ 0:00 pmprobe -h bb-02 -v xbow.nports

So, I can see that pmlogconf is making progress, but it is very slow. It takes more than 8 minutes to go through the groups of metrics for only one host. Is there a way to short-circuit it if the host is down and the remote pmcd did not respond after the first metric?

Thanks!
Rares

<Prev in Thread] Current Thread [Next in Thread>
  • Re: [pcp] pmlogger_check stuck if host is down, Rares Vernica <=