pcp
[Top] [All Lists]

Re: [pcp] pcp update: json pmda

To: Mark Goodwin <mgoodwin@xxxxxxxxxx>, "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Subject: Re: [pcp] pcp update: json pmda
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Tue, 9 Jun 2015 21:56:14 -0400 (EDT)
Cc: pcp developers <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <5576D75E.2010205@xxxxxxxxxx>
References: <20150609004114.GA9357@xxxxxxxxxx> <557641A2.7040801@xxxxxxxxx> <20150609020038.GB9357@xxxxxxxxxx> <5576D75E.2010205@xxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: IGZrAhO4AfyaO2U2EICcJaxHp0Lxbg==
Thread-topic: pcp update: json pmda

----- Original Message -----
> > pairs, where A-B traffic might appear then disappear then later
> > reappear).  What pmdaCacheOp sequence would you recommend?
> 

Whatever sequence caused investigation of the original problem would
do the trick.  It doesn't need to be complex, and doesn't need to hit
all possible code paths/situations.  This sounds like something that
could be readily automated using a custom pmdajson data source?  Or,
whatever you used to expose the problem & test the fix by hand, can
that be turned into a shell script?  That'd get the first part sorted
(automated reproduction of the fundamental aspects of the problem),
then...

> >> Also, some qa to demonstrate the issue and the fix would be
> >> appropriate, especially at this stage of the release.
> >
> > The problem showed up with dramatic slowdowns and lots of diagnostic
> > I/O traffic into /var/tmp and the pmda .log file, not as differences
> > at the pmapi client level (other than sloth).  How would one qa that?

Once the above (even simple) reproducer exists, then the test could
just do _pmda_filter_logfile ... if no logged error messages exist,
the test demonstrates the fix & this slowdown can no longer happen
(AIUI) as a result of that log() call.

This has the nice property that problems unanticipated at this stage
(i.e. other logged errors, perhaps from a json library, perhaps from
elsewhere in the PMDA) would be detected in the future with no change
to the test, since they would also show up in the filtered log.

> maybe capture an archive with a lot of instances coming and going in
> a 'wildly fluctuating' manner as above, e.g. a script that generates
> json data from /dev/random modulo 1000000, with churn of say half
> of them re-appearing or disappearing between fetches? Check the log
> doesn't grow too much, and we get PM_ERR_INST when expected, etc.

I don't think archives are going to help us here, AIUI this is more
of a (live) pmdajson logging-a-bit-too-much kind of issue?

> Anyway - it seems to me the patch is an improvement, and we should
> just pull it in if nobody disagrees.

It seems like it should be straightforward to do a basic check here
that we are not spamming json.log for this (and perhaps other) indom
access patterns.

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>