Hi -
pmlogconf is used by service-pmlogger (intermittently) and
service-pmmgr (frequently). It has recently gotten much much slower,
and I finally figured out why. It's the derived metrics processing.
pmlogconf involves about a hundred pmprobe calls. Each pmprobe is
supposed to do just one fetch on a given metric to see if it exists.
That should take only a couple of packets to the remote pmcd.
But with the new derived-probe auto-loading, libpcp/src/derive.c
engages in quite a chatter with pmcd. For every occurrence of every
metric in the derived-metric.conf file (just iostat.conf for now), it
sends a separate pmLookupName and pmLookupDesc call. Each of those
takes O(milliseconds) & kernel-level context switches. That results
in 80 pmprobes * 100 pmXmitPDUs each. Almost 10000 packets, each
involving context switches and (too many) sendto/select/recvfrom
syscalls, when about a hundred would do.
So we get 100-200ms per pmprobe instead of 2ms. That adds up to
multiple seconds - even to minutes (!) on a VM with inefficient
networking - per pmlogconf, and a great deal of sys% cpu time.
So, what to do? Some options:
- nothing, bletch
- redefine pmlogconf to exclude derived metrics, and have it set
env PCP_DERIVED_CONFIG="" for itself / pmprobe
- have pmmgr set it when it invokes pmlogconf, let others suffer
- improve libpcp/src/derive.c to cache/batch its lookups; this would
require a two-pass configuration file parsing process; pmLookupNames
is batchable but the PMAPI/PDUs lacks a batched pmLookupDesc
- improve pmlogconf to batch its pmprobe invocations
- (improve pmlogconf to parallelize its pmprobe invocations ... might
only aid latency, not load, so this is an unsuggestion)
- FChE
|