pcp
[Top] [All Lists]

RE: pmlogger -u questions

To: "'Frank Ch. Eigler'" <fche@xxxxxxxxxx>
Subject: RE: pmlogger -u questions
From: "Ken McDonell" <kenj@xxxxxxxxxxxxxxxx>
Date: Mon, 28 Apr 2014 08:48:36 +1000
Cc: "'Nathan Scott'" <nathans@xxxxxxxxxx>, <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <y0m1twlsm1i.fsf@xxxxxxxx>
References: <01e901cf56df$4ce97de0$e6bc79a0$@internode.on.net> <1665962954.4723287.1397437104781.JavaMail.zimbra@xxxxxxxxxx> <534B4330.1060008@xxxxxxxxxxxxxxxx> <158034809.5621684.1397540389674.JavaMail.zimbra@xxxxxxxxxx> <005201cf6056$f3de4d80$db9ae880$@internode.on.net> <y0m1twlsm1i.fsf@xxxxxxxx>
Thread-index: AQHyaZW6bsfhxrCcQijEIAaglOJyoQLjeeRRAm7wCyYBhaK/UgJ4eF4BAKxoj7uajNVlsA==
> -----Original Message-----
> From: Frank Ch. Eigler [mailto:fche@xxxxxxxxxx]
> Sent: Friday, 25 April 2014 9:09 PM
> ...
> For added exercise of the metadata/fflush code, this might give a thorough
> workout:
> 
> log mandatory on default {
>      proc
> }
> 
> (with some serious forking/etc. going on in the background).

I would expect this to produce _less_ failures in the old code ... because
the indom will change with every fetch, which causes the metadata file to be
written to, which triggers a fflush() of all the output files.

However, it does expose a remaining (as of Fri) issue with the new code ...
I had deliberately not converted the metadata and index writes to unbuffered
and one logical record per fwrite(), and of course this example exposed a
failure of the killer script (about 1 in 20 failures).

So I've committed changes to make all the archive I/O unbuffered and use one
fwrite() per logical record, which means there is no need for any flush
operations (which used fflush() alone).

What this means is I've moved the memmove() from stdio into the libpcp
routines (so no real change in work), and we do more write() calls for small
pmResults, pmDesc + name records and small pmInDoms, but fewer write() calls
for (the more common) larger pmResults and larger pmInDoms.

With these changes
(a) there is no real difference in CPU load from the numbers I reported
before
(b) the killer script passes solidly, even for the proc metrics case

<Prev in Thread] Current Thread [Next in Thread>