Hi Rares,
----- Original Message -----
> Hello,
>
> I have pmlogger collect logs from multiple hosts. My control file looks
> something like this:
>
Does your /etc/pcp/pmlogger/control file contain PMCD_CONNECT_TIMEOUT=150?
(pretty sure it will, as that oddly seems to be the default currently)
>From digging into this a bit further, that's definitely a big part of the
problem. I've changed it so that we don't set that by default now, which
was always the intention.
It looks like pmlogconf is also not helping here in the way it now iterates
over many metrics, pmprobe'ing each (observed here to introduce a noticable
delay for a downed host too), so I've added a one-trip guard there and that
now fails quickly and moves on.
>
> It seems that pmlogconf is causing the delay. I am not sure what is happening
> but it does not look right. Any thoughts?
>
There's one other contributing factor in libpcp - multiple network addresses
from getaddrinfo(3) causing unexpectedly long connection timeouts - but even
without tackling that, those earlier changes should resolve a large part of
the problem you've observed.
Those changes are in git master branch now if you'd like to try them. Your
pmlogger/control file could be edited directly without upgrading, to reduce
those default timeouts.
cheers.
--
Nathan
|