pcp
[Top] [All Lists]

Re: [pcp] pmlogger_check stuck if host is down

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: [pcp] pmlogger_check stuck if host is down
From: Rares Vernica <rvernica@xxxxxxxxx>
Date: Tue, 3 May 2016 12:09:43 -0700
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=vMeijYC7azdkAF4UsssUsW9+OdChvUVDBGqUrEwREtA=; b=LIQHGF+ZXXgWgLquWdzdK1tEAzW3xq7Xr96omf2ygB092D3QDwGpfLLHLuUBuWxeMI L4TLPWzQQ3Cq6hO8D/wg7ADqwK2+xAox81CAQVI/XsjAqbfpRdh4pTM1s9bexIDf/Z09 jUeDeofPSHIpxLFZSOTA7FQ+D3aHSGJ6jX0VXJ8Q5RIRUNbkni11hr7E+pch5i8IgIgE SNuVPq4+W5CKT/jjsuNvM2i3RiAHEPHvSmPFW111Kd78xbF94A4e0hg+ORKOpcNU5aNZ azy4dOjK6zb1o74Mam/ptLjipqUmYV8AObQKylGvEJOsOy+LCBhsXZJT50qiSZu6CHzY rn4w==
In-reply-to: <639232678.44816362.1462256596580.JavaMail.zimbra@xxxxxxxxxx>
References: <CALQ9KxCa75FNi0RY7rfSrQjJh=L33mPQWZpQpgGy2quPE+cimQ@xxxxxxxxxxxxxx> <639232678.44816362.1462256596580.JavaMail.zimbra@xxxxxxxxxx>
On Thu, Apr 28, 2016 at 8:03 PM, Nathan Scott <nathans@xxxxxxxxxx> wrote:
> > [...]
> > If one of the remote hosts is down, pmlogger_check gets stuck on that host
> > and takes about 30 min to move on. I ran pmlogger_check with -VV and the
> > output looks like:
> >
> > [...]
> > > ps ax | grep pml
>
> (any pmprobe processes running OOC? Âthat grep would have excluded 'em, but
> I wonder if thats where the blockage is)

Yes, pmprobe is running as well:

> ps ax | grep pmp
30792 ? Â Â Â ÂS Â Â Â0:00 pmprobe -h b-02 -v apache.total_accesses

I checked the config.remote file referenced in control and it does not contain any apache metrics:

> grep apache /var/lib/pcp/config/pmlogger/config.remote
#+ apache/processes:x::
#+ apache/summary:x::
#+ apache/uptime:x::

Is pmprobe checking all the metrics, regardless of that it is in the config.remote file?

On Mon, May 2, 2016 at 11:23 PM, Nathan Scott <nathans@xxxxxxxxxx> wrote:
> >
> > I have pmlogger collect logs from multiple hosts. My control file looks
> > something like this:
> >
>
> Does your /etc/pcp/pmlogger/control file contain PMCD_CONNECT_TIMEOUT=150?
> (pretty sure it will, as that oddly seems to be the default currently)

Yes, it was set to the default 150. I reset it to 2. The entire check takes about 8 min now. I guess your other fixes would make the situation better.

On Sat, Apr 30, 2016 at 4:57 AM, Frank Ch. Eigler <fche@xxxxxxxxxx> wrote:
>
> For comparison, if you were to use pmmgr to manage the remote pmloggers,
> you could drop those lines from the pmlogger/control file, and instead:

Thanks for the suggestion! I will check that as well.


Thanks!
Rares
<Prev in Thread] Current Thread [Next in Thread>