pcp
[Top] [All Lists]

Re: Possible pmmgr issue?

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: Possible pmmgr issue?
From: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Date: Thu, 13 Feb 2014 07:58:11 -0500
Cc: pcp developers <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <1444843200.6174759.1392282098405.JavaMail.zimbra@xxxxxxxxxx>
References: <1952377955.6159460.1392281163287.JavaMail.zimbra@xxxxxxxxxx> <1444843200.6174759.1392282098405.JavaMail.zimbra@xxxxxxxxxx>
User-agent: Mutt/1.4.2.2i
Hi -


> [...]
> Note the size of the 2 merge archives.  I need to clean this up,
> as my rootfs is out of space once more.  Any suggestions on what
> to look for before I do that?  Thanks.

>From what I see, it looks like you just happened to catch the machine
mid-merge, just after the pmcd was lost & restarted, so pmmgr stopped
& restarted the pmlogger.


> -rw-r--r-- 1 pcp pcp 546M Feb 13 14:34 merged-archive-20140213.033222.0
> -rw-r--r-- 1 pcp pcp  85K Feb 13 14:34 merged-archive-20140213.033222.index
> -rw-r--r-- 1 pcp pcp 144K Feb 13 14:34 merged-archive-20140213.033222.meta
> -rw-r--r-- 1 pcp pcp 335M Feb 13 19:43 merged-archive-20140213.084228.0
> -rw-r--r-- 1 pcp pcp  53K Feb 13 19:43 merged-archive-20140213.084228.index
> -rw-r--r-- 1 pcp pcp 120K Feb 13 19:43 merged-archive-20140213.084228.meta

> [iow, this is a small host configuration - yet those logs seem big?]

I don't know.  pmmgr does not create archives: it's pmlogger and
pmlogextract, with the documented/saved configurations.

For comparison, tofan.yyz's two-week merged logs are about that size,
storing right about 40 MB/day of pmlogger traffic, whereas on a
smaller workstation at home it's about a third of that.  (It used to
be way more, back before my first pmlogger optimization; I didn't
study the aftereffects of yours.)


> nathans@verge:/source/git/nathans-pcp$ ps -ef | grep pmmgr
> pcp       8022     1  0 Feb10 ?        00:00:01 /usr/bin/pmie [...]
> pcp      10670     1  0 Feb11 ?        00:00:01 /usr/bin/pmie [...]
> pcp      19218 19214  0 19:42 pts/0    00:00:00 /usr/bin/pmie [...]
> pcp      20518     1  0 Feb12 ?        00:00:00 /usr/bin/pmie [...]

(Those pmie processes must have been left from older runs; I believe I
fixed their killing.)


> nathans@verge:/source/git/nathans-pcp$ cat /var/log/pcp/pmmgr/pmmgr.log
> [Thu Feb 13 19:42:18] pmmgr(19195/19195): Log started
> [Thu Feb 13 19:42:18] pmmgr(19195/19195): /etc/pcp/pmmgr: new hostid verge at 
> local:
> nathans@verge:/source/git/nathans-pcp$ 
> nathans@verge:/source/git/nathans-pcp$ cat /var/log/pcp/pmmgr/pmmgr.log.prev 
> [Wed Feb 12 14:29:39] pmmgr(20467/20467): Log started
> [Wed Feb 12 14:29:39] pmmgr(20467/20467): /etc/pcp/pmmgr: new hostid verge at 
> local:
> [Thu Feb 13 19:41:08] pmmgr(20467/20467): /etc/pcp/pmmgr: dead hostid verge
> [Thu Feb 13 19:41:27] pmmgr(20467/20467): Log finished

... so your pmcd went down twice in 80 seconds, which pmmgr detected
and responded to as designed.


> Typical daily log size from pmlogger_daily here is ~6MB...
> -rw-r--r-- 1 pcp pcp 6469616 Feb 13 00:10 20140212.0
> -rw-r--r-- 1 pcp pcp    1512 Feb 13 00:10 20140212.index
> -rw-r--r-- 1 pcp pcp   12982 Feb 13 00:10 20140212.meta

That must be from a different pmlogger configuration (diff the /etc
and /var/log/pcp/pmmgr/$host configurations), or else there must be a
serious bug in pmlogextract or somesuch.  I don't see how pmmgr per se
could be responsible for such an order-of-magnitude difference.


> nathans@verge:/source/git/nathans-pcp$ ls -l /var/log/pcp/pmmgr/verge/
> total 559904
> -rw-r--r-- 1 pcp pcp 571969000 Feb 13 19:44 merged-archive-20140213.084228.0
> -rw-r--r-- 1 pcp pcp     86872 Feb 13 19:44 
> merged-archive-20140213.084228.index
> -rw-r--r-- 1 pcp pcp    155489 Feb 13 19:44 
> merged-archive-20140213.084228.meta
> [...]

> All looks fine, hmmm.  Nothing untoward in the pmlogger config AFAICS.
> I'll dig some more tomorrow - can put the archive someplace if you want
> to take a peek too?

Sure: both the /etc/pcp primary-pmlogger data and the
/usr/log/pcp/pmmgr/$host ones.


- FChE

<Prev in Thread] Current Thread [Next in Thread>