Hmmm.. I rebooted these nodes this morning and now the problem appears to have
disappeared....
Very strange. I am going back an looking at when 3.3.3-1 was installed, it
could be something was out of sync in this cluster.
-----Original Message-----
From: Nathan Scott [mailto:nathans@xxxxxxxxxx]
Sent: Wednesday, August 18, 2010 5:46 PM
To: Siekas, Greg
Cc: pcp@xxxxxxxxxxx
Subject: Re: [pcp] pcp 3.3.3-1 problem
----- "Greg Siekas" <greg.siekas@xxxxxxxxxx> wrote:
> Nathan,
>
> Thanks for the reply, here's the details you requested.
>
> It's failing on the kernel.pernode.cpu.nice metric.
>
Interesting - it looks alot like a problem I fixed just
before 3.3.3 ... are you sure you are on 3.3.3 and not
3.3.2? Can you start pmcd and send output from pcp(1)
command?
> ...
> kernel.pernode.cpu.user
> inst [0 or "node0"] value 6776690
> inst [1 or "node1"] value 6913290
> kernel.pernode.cpu.nice
> kernel.pernode.cpu.nice: pmFetch: IPC protocol failure
> ...
I have a x86_64 system here which is exactly this config
and its got no issues. It *did* have issues on 3.3.2 ...
so, hopefully this is just a case of mistaken identity.
If not, next thing to do will be to run pmcd via valgrind
with same (-f) args, retry the fetch test, then see what
valgrind reports when it fails.
cheers.
--
Nathan
|