pcp
[Top] [All Lists]

Re: [pcp] A number of pmlogger_check gripes ...

To: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Subject: Re: [pcp] A number of pmlogger_check gripes ...
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Sat, 6 Jul 2013 01:46:46 -0400 (EDT)
Cc: PCP Mailing List <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <51D7971C.8090504@xxxxxxxxxxxxxxxx>
References: <51D7971C.8090504@xxxxxxxxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: xho3c0jxIzWZ7FRRhg1aSnFWrCmBZw==
Thread-topic: A number of pmlogger_check gripes ...
Hi Ken,

----- Original Message -----
> When pmlogger_check.sh was relocated recently in the source tree, all
> the revision history was lost.  Is it possible to revert 499b393 and
> redo it in a way that keeps the revision history with the file, or is
> this a git "feature"?

It was moved there via "git mv" so not sure that there's any other way?
It may be the history is still there, just not visible with the default
"git log" invocation (not very helpful, I know).

> Anyway, the real issue here is commit dc62541 that added pmlogconf to
> pmlogger_check.sh (I have not checked but suspect the same may apply to
> the related changes made to the pmie control scripts). Deep inside

The pmlogger scripts were made to match the pmie scripts in this regard,
using similar logic that those scripts have had for automated pmieconf
invocation for many years.

> pmlogger_check I found this
> 
>      if $PMLOGCONF -q -h $hostname $tmp/pmlogger
> 
> now pmlogconf is designed to be interactive, so what really happens here
> depends on where stdin is coming from.  As this is run from cron usually
> (but not always), that is likely to be /dev/null and we get a sort of
> default configuration file generated.

That was the expected behaviour (not sure why its a "sort of" default?)
The testing I did for both cron or init (service) invocation indicated
it was (is?) working just fine.

> Now, what if the pmlogger configuration file was already crafted by hand
> using pmlogconf and carefully selecting groups of metrics to be logged?
>   Along comes pmlogger_check and *whack* your pmlogger config file is
> changed from what you really wanted to something "defaulty".  This
> happens silently.  So the sysadmin only finds out when they go to look
> at an archive to solve a problem ... *honk* no cigar.
> 
> This is not a hypothetical Dr No post, it just happened to me on the
> logging farm for 32 production machines and the road to recovery is not
> pretty.  Fortunately (!) we had a system crash soon after so someone was
> looking at the logs, otherwise it could have been weeks before the
> snarfoo was noticed.

Hmm, that was not anticipated at all - really sorry about that. :(
And appreciate that it was found & reported so quickly - means we
don't bit others in the same way in the next release.

> We need to be a lot smarter about how "automated" stuff is done ... I
> don't know how to resolve this particular case but the status quo is not
> even close to acceptable.

Fairly straight forward to resolve this - if we add a flag to pmlogconf to
run it in non-interactive, auto-generate/refresh mode.  This can add a new
comment line near the top of the file indicating date/time (pmieconf has
this already, IIRC) of the run, and indicating the fact that it was indeed
auto-generated.  Without that tag, we should leave well alone.  We do leave
well alone non-pmieconf generated configs already, but this was a wrinkle
that I completely didn't anticipate.

I'll need to navel gaze for awhile to ponder whether pmieconf/pmie_check
has had the same problem (for many years).

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>