pcp
[Top] [All Lists]

pmlogger daily issue

To: pcp <pcp@xxxxxxxxxxx>
Subject: pmlogger daily issue
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Fri, 6 Aug 2010 14:39:06 +1000 (EST)
Hi all,

Hit a problem on several hosts yesterday, after an upgrade.
What I *think* happened was:
1. upgrade pcp & our added pmdas (via yum/rpm)
2. /etc/init.d/pcp restart
(at this point, remote pmlogger loses connection & stops).
3. pcp restart on *monitor* host too.
(at which point its running around restart all the loggers)
4. cron comes in on the monitor host and also begins restarting
while the start script still doing same.
5. somehow, despite pmlock et al, we end up with:
...
20100801.index               20100805.09.30.0
20100801.meta                20100805.09.30.index
20100802.0.bz2               20100805.09.30.meta
20100802.index               20100805.09.31.0
20100802.meta                20100805.09.31.index
20100803.0                   20100805.09.31.meta

(the ones on RHS are of interest).  so... two pmloggers running
for the same remote host, one started in the minute after the
first.

6. Come midnight, the second one "wins" in terms of being
rotated, and the other continues on...

# pmdumplog -l 20100805.09.30
Log Label (Log Format Version 2)
Performance metrics from host index3
  commencing Thu Aug  5 09:30:26.481 2010
  ending     Fri Aug  6 08:21:41.523 2010
# pmdumplog -l 20100805.09.31
Log Label (Log Format Version 2)
Performance metrics from host index3
  commencing Thu Aug  5 09:31:19.404 2010
  ending     Fri Aug  6 00:36:49.345 2010

I have a -T safe-guard on the pmloggers, which I think is
what stopped this runaway eventually (no manual intervention
from me, anyway).


... not good!?  Only theory I can come up with is that somehow
(obviously) we're racing & starting two loggers - but because
we have fairly extensive logger configs (hundreds of metrics),
startup time can be lengthy ... and pmlogger.c only creates the
control ports in /var/tmp/pmlogger just before entering its main
loop.  Should that perhaps move earlier on in the piece, before
sending any (potentially-remote) pdu traffic to pmcd?  (isn't a
pmlock held somewhere to prevent this all thoug?)  Anyone else
have any more plausible theories...?

cheers.

-- 
Nathan

<Prev in Thread] Current Thread [Next in Thread>
  • pmlogger daily issue, Nathan Scott <=