pcp
[Top] [All Lists]

Re: [pcp] pmlogger_merge failing on Centos 6.4

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: [pcp] pmlogger_merge failing on Centos 6.4
From: Chandana De Silva <chandana@xxxxxxxxxxxxx>
Date: Mon, 24 Jun 2013 13:58:18 +1000
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <2106903857.6445515.1372040111175.JavaMail.root@xxxxxxxxxx>
References: <51C79DC2.8090606@xxxxxxxxxxxxx> <2106903857.6445515.1372040111175.JavaMail.root@xxxxxxxxxx>
Reply-to: chandana@xxxxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6
All,

This looks like a memory problem. The pmlogger host has only 1 GB, and I now see OOM-Killer messages in syslog.

I will give the host some more memory and see what happens.
On 24/06/13 12:15, Nathan Scott wrote:
Hi Chandana,

----- Original Message -----
All,

The nightly cron job to merge logs is failing on a new Centos 6.4 box.
I ran /usr/libexec/pcp/bin/pmlogger_daily with the -t option, but the
trace (attached) is silent on why it is failing.
Hmmm, relevant bit of the trace is this ...

Performance metrics from host corona-int.m4u.com.au
   commencing Sat Jun 22 15:17:42.916 2013
   ending     Sun Jun 23 07:56:22.874 2013
Archive timezone: EST-10
PID for pmlogger: 11308
Input archives to be merged:
        20130622.14.27
        20130622.14.27-00
        20130622.14.28
        20130622.15.12
        20130622.15.16
/usr/libexec/pcp/bin/pmlogger_merge: line 253: 27438 Killed                  
$cmd
pmlogger_merge: Directory: /var/log/pcp/pmlogger/corona
pmlogger_merge: Failed: pmlogextract  20130622.14.27 20130622.14.27-00 
20130622.14.28 20130622.15.12 20130622.15.16 20130622

... which is indeed lacking details as to why pmlogextract failed.

If this machine is recording the mysql metrics, and you have the patch
you sent me earlier today running on your mysql server, I suspect it
may be the root cause.  That patch changed the type and units of one of
the mysql metrics, which pmlogextract wont be able to reconcile.

You might get more detailed information from a manual pmlogger_merge
on those archives (see the final "Failed" line above) - the -VV option
can be used to ratchet up its verbosity.

If it turns out this is from the pmdamysql change, you'll want to add a
new pmlogrewrite(1) file to help yourself and others out - see example
of /var/lib/pcp/config/pmlogrewrite/linux_proc_migrate.conf - IIRC, it
will automatically fix this issue (on all affected servers) once that
is in place.

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>