By default, pmlogger_check runs at 25mins and 55 mins past the hour, and
pmlogger_daily runs just past midnight at 00:10:00.
If both are running at the same time, this can cause badness and we have a
shared mutex lock file created in each /var/log/pcp/pmlogger/<dir> so they
don't stand on one another's toes.
All good so far.
Now I've come across a system where there are _so_ _many_ pmloggers running,
collecting very big archives, and the daily script is still running at 00:25:00
which triggers annoying (but benign) cron mail of the form:
pmlogger_check: Warning: is another PCP cron job running concurrently?
---------- 1 pcp pcp 0 Jul 4 00:25 /var/log/pcp/pmlogger/<somehost>/lock
pmlogger_check [/etc/pcp/pmlogger/control:78]
Warning: failed to acquire exclusive lock
(/var/log/pcp/pmlogger/<somehost>/lock) ...
I want this mail to not be generated, so mail from the cron jobs indicates a
real error that needs to be actioned.
The proposed fix is to have pmlogger_daily put its pid in
$PCP_RUN_DIR/pmlogger_daily, and then when pmlogger_check fails to acquire the
lock (very rarely under normal circumstances), it checks
1. does $PCP_RUN_DIR/pmlogger_daily exist, and
2. does the process with the pid therein exist as a current "sh" execution?
If yes to both, then be silent, otherwise report warning as per today.