obtained pcp 2.7.8 from git to get at the perl PMDA bits. Built their
own perl-based Lustre PMDA. This combination worked for many weeks. Then
I configured pmlogger.
service100 /var/log/pcp/pmcd # ls -altr
total 5992
-rw-r--r-- 1 root root 106 Nov 19 11:24 simple.log
drwxr-xr-x 6 root root 4096 Dec 3 15:11 ..
-rw-r--r-- 1 root root 790 Feb 25 18:41 pmcd.log.prev
-rw-r--r-- 1 root root 939 Feb 25 18:41 lustre.log.prev
-rw-r--r-- 1 root root 790 Feb 26 00:41 pmcd.log
-rw-r--r-- 1 root root 939 Feb 26 00:41 lustre.log
-rw------- 1 root root 5828608 Feb 26 00:41 core
-rwxr-xr-x 1 root root 200006 Feb 26 07:32 pmcd
-rw-r--r-- 1 root root 234275 Feb 26 07:33
pmcdfail.service100.200902260041.tar.gz
drwxr-xr-x 2 root root 4096 Feb 26 07:33 .
service100 /var/log/pcp/pmcd # cat pmcd.log
Log for pmcd on service100 started Wed Feb 25 22:07:49 2009
active agent dom pid in out ver protocol parameters
============ === ===== === === === ======== ==========
pmcd 2 2 dso i:2
lib=/var/lib/pcp/pmdas/pmcd/pmda_pmcd.so entry=pmcd_init [0x2aaaab057e1a]
linux 60 2 dso i:3
lib=/var/lib/pcp/pmdas/linux/pmda_linux.so entry=linux_init [0x2aaaab28fa53]
lustre 142 22470 9 10 2 bin pipe cmd=perl
/var/lib/pcp/pmdas/lustre/pmdalustre.pl
Host access list empty: access control turned off
pmcd: PID = 22456, PDU version = 2
pmcd request port(s):
sts fd port IP addr
=== === ===== ==========
ok 0 44321 0x00000000 INADDR_ANY
[Thu Feb 26 00:41:01] pmcd(22456) Error: Unexpected signal 11 ...
Dumping to core ...
service100 /var/log/pcp/pmcd # cat lustre.log
Log for pmdalustre on service100 started Wed Feb 25 22:07:49 2009
[Wed Feb 25 22:07:49] lustre(22470) Info: lustre_refresh_mds()
[Wed Feb 25 22:07:49] lustre(22470) Info: lustre_refresh_ost()
[Wed Feb 25 22:07:49] lustre(22470) Info: lustre_refresh_fsnames()
Use of uninitialized value in hash element at
/var/lib/pcp/pmdas/lustre/pmdalustre.pl line 292.
Use of uninitialized value in concatenation (.) or string at
/var/lib/pcp/pmdas/lustre/pmdalustre.pl line 293.
Use of uninitialized value in hash element at
/var/lib/pcp/pmdas/lustre/pmdalustre.pl line 293.
[Wed Feb 25 22:07:49] lustre(22470) Info: lustre_refresh_fsnames: nobackupp2
[Wed Feb 25 22:07:49] lustre(22470) Info: lustre_refresh_osc()
[Wed Feb 25 22:07:49] lustre(22470) Info: lustre_refresh_osc: 60
[Wed Feb 25 22:07:49] lustre(22470) Info: lustre_refresh_llite()
[Wed Feb 25 22:07:49] lustre(22470) Info: lustre_refresh_llite: 0
Log finished Thu Feb 26 00:41:01 2009
service100 /var/log/pcp/pmcd # gdb pmcd core
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-suse-linux"...
Using host libthread_db library "/lib64/libthread_db.so.1".
warning: exec file is newer than core file.
Reading symbols from /usr/lib64/libpcp.so.3...done.
Loaded symbols for /usr/lib64/libpcp.so.3
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /var/lib/pcp/pmdas/pmcd/pmda_pmcd.so...done.
Loaded symbols for /var/lib/pcp/pmdas/pmcd/pmda_pmcd.so
Reading symbols from /usr/lib64/libpcp_pmda.so.3...done.
Loaded symbols for /usr/lib64/libpcp_pmda.so.3
Reading symbols from /var/lib/pcp/pmdas/linux/pmda_linux.so...done.
Loaded symbols for /var/lib/pcp/pmdas/linux/pmda_linux.so
Reading symbols from /lib64/libpthread.so.0...done.
Loaded symbols for /lib64/libpthread.so.0
Core was generated by `/usr/share/pcp/bin/pmcd'.
Program terminated with signal 6, Aborted.
#0 0x00002aaaaae40bb5 in raise () from /lib64/libc.so.6
(gdb) where
#0 0x00002aaaaae40bb5 in raise () from /lib64/libc.so.6
#1 0x00002aaaaae41fb0 in abort () from /lib64/libc.so.6
#2 0x0000000000406158 in SigBad (sig=11) at pmcd.c:913
#3 <signal handler called>
#4 0x0000000000410823 in AcceptNewClient (reqfd=0) at client.c:69
#5 0x0000000000405bc8 in ClientLoop () at pmcd.c:710
#6 0x0000000000406a90 in main (argc=1, argv=0x7fffffffe748) at pmcd.c:1116
(gdb) quit
service100 /var/log/pcp/pmcd # rpm -qa | grep pcp
pcp-2.7.8-20081117
ldapcpplib-0.0.4-14.13
service100 /var/lib/pcp/config/pmlogger # cat config.nas_lmds
#pmlogger Version 1
#
#
#
# basic pmlogger config file sufficient for ...
#
# pmchart Overview
# osvis
# dkvis
# mpvis
# pmstat
# pmlogconf default selections
# pmieconf rules
#
# edited locally to be both more and less than the above
# adding lustre.mds support
#
log mandatory on once { hinv }
log advisory on default {
disk.all
disk.ctl.avg_disk.active
disk.dev
filesys
kernel.all
kernel.percpu
mem.freemem
mem.util
mem.util.free
mem.util.fs_clean
mem.util.fs_ctl
mem.util.fs_dirty
mem.util.kernel
mem.util.user
network.interface
network.tcp.closed
network.tcp.conndrops
network.tcp.drops
network.tcp.rcvtotal
network.tcp.rexmttimeo
network.tcp.sndpack
network.tcp.sndrexmitpack
network.tcp.sndtotal
network.tcp.timeoutdrop
network.udp.ipackets
network.udp.opackets
rpc
lustre.mds
}
|