pcp
[Top] [All Lists]

Re: [pcp] Debugging sigpipe in pmda

To: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>, <pcp@xxxxxxxxxxx>
Subject: Re: [pcp] Debugging sigpipe in pmda
From: Jeff Hanson <jhanson@xxxxxxx>
Date: Tue, 16 Aug 2016 17:45:37 -0400
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <83f3710f-d758-6f7d-d9af-480fb897f4c8@xxxxxxxxxxxxxxxx>
References: <df62753e-0d3d-3626-cd6e-ed1f8e17fd2e@xxxxxxx> <1831980510.1015515.1470956662271.JavaMail.zimbra@xxxxxxxxxx> <b735a150-5aa2-04f0-d9df-f4e8eb699c19@xxxxxxx> <y0m1t1ophga.fsf@xxxxxxxx> <83f3710f-d758-6f7d-d9af-480fb897f4c8@xxxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2
On 08/16/2016 05:23 PM, Ken McDonell wrote:
On 17/08/16 06:49, Frank Ch. Eigler wrote:
...

G'day Frank,

I agree on the strace for PMDA not pmcd comments.

...
Come to think of it, there are few PMDAs that have NOT been hit by
this issue at some point.  I wonder if it's time that a more systemic
solution be invented (not just restarting timed-out pmdas).

But I think this assertion is not correct ... there are in fact very few PMDAs 
that have hit this issue, specifically there are 81 PMDAs in the current source 
tree and very few of these have triggered PDU timeout issues for pmcd.  The 
most notable and long-standing cases are the DBMS PMDAs where SQL queries are 
used.

And the "solution" is a standard one ...

If the source of the metrics cannot answer the "gimme the values" request from 
pmcd in less than 5 seconds then that source cannot pretend to be able to deliver 
real-time data (which is the basic assumption in the way pmcd interacts with clients and 
PMDAs).


The behavior seems only to have started when I increased the number of systems 
from which
metrics could be fetched.  In my two tests it took 20 minutes and then 30+ 
minutes to
replicate with two pmval <cluster metric> running at default sample rate.


If this is the case, then the PMDA developer must adopt a multi-threaded 
caching approach where one thread is timer driven and periodically updates the 
cache of metric values while another thread is PDU driver and services requests 
from pmcd using the most recently cached values. This is a standard template 
that does not touch any of the PCP APIs.

This approach reduces the quality of the data (in terms of timeliness) and adds 
overhead (the refreshing thread runs even if no client of pmcd is requesting 
the data).  And for these reasons this is not the preferred PMDA architecture 
if it can be avoided.


What then would be the preferred architecture?
--
-----------------------------------------------------------------------
Jeff Hanson - jhanson@xxxxxxx - Senior Technical Support Engineer

You can choose a ready guide in some celestial voice.
If you choose not to decide, you still have made a choice.
You can choose from phantom fears and kindness that can kill;
I will choose a path that's clear
I will choose freewill. - Peart

<Prev in Thread] Current Thread [Next in Thread>