pcp
[Top] [All Lists]

Re: Possible pmmgr issue?

To: "Frank Ch. Eigler" <fche@xxxxxxxxxx>, Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Subject: Re: Possible pmmgr issue?
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Thu, 13 Feb 2014 18:21:26 -0500 (EST)
Cc: pcp developers <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <20140213125811.GG11820@xxxxxxxxxx>
References: <1952377955.6159460.1392281163287.JavaMail.zimbra@xxxxxxxxxx> <1444843200.6174759.1392282098405.JavaMail.zimbra@xxxxxxxxxx> <20140213125811.GG11820@xxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: HOpbPc/o4bhv/5nU+trHjJVGBcwing==
Thread-topic: Possible pmmgr issue?

----- Original Message -----
> > [...]
> > Note the size of the 2 merge archives.  I need to clean this up,
> > as my rootfs is out of space once more.  Any suggestions on what
> > to look for before I do that?  Thanks.
> 
> From what I see, it looks like you just happened to catch the machine
> mid-merge, just after the pmcd was lost & restarted, so pmmgr stopped
> & restarted the pmlogger.

This would've been right after a package upgrade - so the stop/start
there was from the packaging scripts.  Which is another interesting
observation in itself (uber-log merging happens not just once a day,
but also on this sort of event?  that doesn't seem ideal on the face
of it, is it necessary or can that be delayed till the wee hours?)

> > -rw-r--r-- 1 pcp pcp 546M Feb 13 14:34 merged-archive-20140213.033222.0
> > -rw-r--r-- 1 pcp pcp  85K Feb 13 14:34 merged-archive-20140213.033222.index
> > -rw-r--r-- 1 pcp pcp 144K Feb 13 14:34 merged-archive-20140213.033222.meta
> > -rw-r--r-- 1 pcp pcp 335M Feb 13 19:43 merged-archive-20140213.084228.0
> > -rw-r--r-- 1 pcp pcp  53K Feb 13 19:43 merged-archive-20140213.084228.index
> > -rw-r--r-- 1 pcp pcp 120K Feb 13 19:43 merged-archive-20140213.084228.meta
> 
> > [iow, this is a small host configuration - yet those logs seem big?]
> 
> I don't know.  pmmgr does not create archives: it's pmlogger and
> pmlogextract, with the documented/saved configurations.

I wonder if we're somehow doing multiple merges of the same data,
somehow?  (not sure how, but can't explain it any other way).  Does
pmlogextract do exact duplicate result elimination Ken?  I'd imagine
it does not (would be quite difficult & expensive to detect).

> For comparison, tofan.yyz's two-week merged logs are about that size,
> storing right about 40 MB/day of pmlogger traffic, whereas on a
> smaller workstation at home it's about a third of that.  (It used to
> be way more, back before my first pmlogger optimization; I didn't
> study the aftereffects of yours.)

(should be very similar - those later changes were more about pmFetch
optimisation, which would have a less profound effect than the earlier
metric de-dup work done -- except for some corner cases like metrics
with instances specified, but that wont be the case here I imagine).

> 
> > nathans@verge:/source/git/nathans-pcp$ ps -ef | grep pmmgr
> > pcp       8022     1  0 Feb10 ?        00:00:01 /usr/bin/pmie [...]
> > pcp      10670     1  0 Feb11 ?        00:00:01 /usr/bin/pmie [...]
> > pcp      19218 19214  0 19:42 pts/0    00:00:00 /usr/bin/pmie [...]
> > pcp      20518     1  0 Feb12 ?        00:00:00 /usr/bin/pmie [...]
> 
> (Those pmie processes must have been left from older runs; I believe I
> fixed their killing.)

Oh, I overlooked those entirely - thanks.

> 
> > Typical daily log size from pmlogger_daily here is ~6MB...
> > -rw-r--r-- 1 pcp pcp 6469616 Feb 13 00:10 20140212.0
> > -rw-r--r-- 1 pcp pcp    1512 Feb 13 00:10 20140212.index
> > -rw-r--r-- 1 pcp pcp   12982 Feb 13 00:10 20140212.meta
> 
> That must be from a different pmlogger configuration (diff the /etc
> and /var/log/pcp/pmmgr/$host configurations), or else there must be a
> serious bug in pmlogextract or somesuch.  I don't see how pmmgr per se
> could be responsible for such an order-of-magnitude difference.

Only way I can imagine we'd reach these dizzying sizes is somehow the
same data is being merged multiple times.  No idea how that could be,
it seems a far-fetched theory but its the best I've got so far.

> 
> Sure: both the /etc/pcp primary-pmlogger data and the
> /usr/log/pcp/pmmgr/$host ones.
> 

I'm uploading all the logs and configuration to somewhere public on
oss.sgi.com where everyone can grab it.  I'll send a followup note
once its finished - ETA another 20mins.  Thanks for the help guys!

Ken, I'd be interested in your thoughts on a few observations from
the pmlogger_daily scripts on this host (all defaults are in place,
for everything - both pmlogger_daily & pmmgr):
- there seems to be some older data that is not being culled?
- there's a couple of logs that haven't merged, possibly cos one has
a zero sized file or two?
- from inspection of the pmlogger_daily script, we appear to always
do a logmerge, even if there is only one log - could this not simply
be handled via a mv(1) of the files?  (avoiding the read/write I/O
there entirely, for the simple case of one archive for the previous
day - i.e. no pmcd/pmlogger restarts).

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>