pcp
[Top] [All Lists]

automatic derived metrics slowing down remote pcp clients, esp. pmlogcon

To: pcp developers <pcp@xxxxxxxxxxx>, mgoodwin@xxxxxxxxxx
Subject: automatic derived metrics slowing down remote pcp clients, esp. pmlogconf
From: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Date: Sat, 14 May 2016 15:39:45 -0400
Delivered-to: pcp@xxxxxxxxxxx
User-agent: Mutt/1.4.2.2i
Hi -

pmlogconf is used by service-pmlogger (intermittently) and
service-pmmgr (frequently).  It has recently gotten much much slower,
and I finally figured out why.  It's the derived metrics processing.
pmlogconf involves about a hundred pmprobe calls.  Each pmprobe is
supposed to do just one fetch on a given metric to see if it exists.
That should take only a couple of packets to the remote pmcd.

But with the new derived-probe auto-loading, libpcp/src/derive.c
engages in quite a chatter with pmcd.  For every occurrence of every
metric in the derived-metric.conf file (just iostat.conf for now), it
sends a separate pmLookupName and pmLookupDesc call.  Each of those
takes O(milliseconds) & kernel-level context switches.  That results
in 80 pmprobes * 100 pmXmitPDUs each.  Almost 10000 packets, each
involving context switches and (too many) sendto/select/recvfrom
syscalls, when about a hundred would do.

So we get 100-200ms per pmprobe instead of 2ms.  That adds up to
multiple seconds - even to minutes (!) on a VM with inefficient
networking - per pmlogconf, and a great deal of sys% cpu time.


So, what to do?  Some options:

- nothing, bletch

- redefine pmlogconf to exclude derived metrics, and have it set
  env PCP_DERIVED_CONFIG="" for itself / pmprobe

- have pmmgr set it when it invokes pmlogconf, let others suffer

- improve libpcp/src/derive.c to cache/batch its lookups; this would
  require a two-pass configuration file parsing process; pmLookupNames
  is batchable but the PMAPI/PDUs lacks a batched pmLookupDesc

- improve pmlogconf to batch its pmprobe invocations

- (improve pmlogconf to parallelize its pmprobe invocations ... might
  only aid latency, not load, so this is an unsuggestion)


- FChE

<Prev in Thread] Current Thread [Next in Thread>