pcp
[Top] [All Lists]

A number of pmlogger_check gripes ...

To: PCP Mailing List <pcp@xxxxxxxxxxx>
Subject: A number of pmlogger_check gripes ...
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Sat, 06 Jul 2013 14:03:40 +1000
Delivered-to: pcp@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130623 Thunderbird/17.0.7
When pmlogger_check.sh was relocated recently in the source tree, all the revision history was lost. Is it possible to revert 499b393 and redo it in a way that keeps the revision history with the file, or is this a git "feature"?

Anyway, the real issue here is commit dc62541 that added pmlogconf to pmlogger_check.sh (I have not checked but suspect the same may apply to the related changes made to the pmie control scripts). Deep inside pmlogger_check I found this

    if $PMLOGCONF -q -h $hostname $tmp/pmlogger

now pmlogconf is designed to be interactive, so what really happens here depends on where stdin is coming from. As this is run from cron usually (but not always), that is likely to be /dev/null and we get a sort of default configuration file generated.

Now, what if the pmlogger configuration file was already crafted by hand using pmlogconf and carefully selecting groups of metrics to be logged? Along comes pmlogger_check and *whack* your pmlogger config file is changed from what you really wanted to something "defaulty". This happens silently. So the sysadmin only finds out when they go to look at an archive to solve a problem ... *honk* no cigar.

This is not a hypothetical Dr No post, it just happened to me on the logging farm for 32 production machines and the road to recovery is not pretty. Fortunately (!) we had a system crash soon after so someone was looking at the logs, otherwise it could have been weeks before the snarfoo was noticed.

We need to be a lot smarter about how "automated" stuff is done ... I don't know how to resolve this particular case but the status quo is not even close to acceptable.

<Prev in Thread] Current Thread [Next in Thread>