From owner-pcp@oss.sgi.com Wed Nov 1 05:04:21 2000 Received: by oss.sgi.com id ; Wed, 1 Nov 2000 05:04:11 -0800 Received: from tisch.mail.mindspring.net ([207.69.200.157]:62248 "EHLO tisch.mail.mindspring.net") by oss.sgi.com with ESMTP id ; Wed, 1 Nov 2000 05:04:02 -0800 Received: from azif.vallinor4.com (user-2ivedmd.dialup.mindspring.com [165.247.54.205]) by tisch.mail.mindspring.net (8.9.3/8.8.5) with ESMTP id IAA28818 for ; Wed, 1 Nov 2000 08:04:00 -0500 (EST) Received: (from abel@localhost) by azif.vallinor4.com (8.9.3/8.9.3) id IAA01324; Wed, 1 Nov 2000 08:09:38 -0500 X-Authentication-Warning: azif.vallinor4.com: abel set sender to abel@vallinor4.com using -f To: pcp@oss.sgi.com Subject: local pmlogger disable References: From: abel@vallinor4.com (Alexander L. Belikoff) Date: 01 Nov 2000 08:09:38 -0500 In-Reply-To: Alan Bailey's message of "Tue, 31 Oct 2000 14:18:10 -0600 (CST)" Message-ID: Lines: 23 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Hello everybody - I'm just discovered PCP, so first of all, thanks SGI folks - the tool seems to be great and of all cluster management tools around, PCP seems to be the only one designed with scalability in mind. Thanks! Now, I'm trying to run PCP on a number of machines with logging on a special standalone node. I was able to start pmlogger for each of the collector nodes on that standalone machine, yet I have some problems disabling local pmlogger on the collector nodes. Commenting the LOCALHOST entry in the control file doesn't seem to help. The docs mention /etc/config/ files, yet they aren't present in the installation (I'm using pcp-2.1.10-8 on RedHat 6.2). So, how do I disable the local pmlogger? Thanks in advance, -- Alexander L. Belikoff GPG f/pr: 0D58 A804 1AB1 4CD8 8DA9 Bloomberg L.P. 424B A86E CD0D 8424 2701 mailto://abel@vallinor4.com (http://pgp5.ai.mit.edu for the key) From owner-pcp@oss.sgi.com Wed Nov 1 15:00:34 2000 Received: by oss.sgi.com id ; Wed, 1 Nov 2000 15:00:14 -0800 Received: from pneumatic-tube.sgi.com ([204.94.214.22]:27146 "EHLO pneumatic-tube.sgi.com") by oss.sgi.com with ESMTP id ; Wed, 1 Nov 2000 14:59:53 -0800 Received: from rattle.melbourne.sgi.com (rattle.melbourne.sgi.com [134.14.55.145]) by pneumatic-tube.sgi.com (980327.SGI.8.8.8-aspam/980310.SGI-aspam) via ESMTP id PAA01102 for ; Wed, 1 Nov 2000 15:02:59 -0800 (PST) mail_from (kenmcd@melbourne.sgi.com) Received: from localhost (kenmcd@localhost) by rattle.melbourne.sgi.com (SGI-8.9.3/8.9.3) with ESMTP id JAA73941; Thu, 2 Nov 2000 09:50:16 +1100 (AEDT) X-Authentication-Warning: rattle.melbourne.sgi.com: kenmcd owned process doing -bs Date: Thu, 2 Nov 2000 09:50:16 +1100 From: Ken McDonell Reply-To: kenmcd@sgi.com To: "Alexander L. Belikoff" cc: pcp@oss.sgi.com Subject: Re: local pmlogger disable In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing On 1 Nov 2000, Alexander L. Belikoff wrote: > > Hello everybody - > > I'm just discovered PCP, so first of all, thanks SGI folks - the tool > seems to be great and of all cluster management tools around, PCP > seems to be the only one designed with scalability in mind. Thanks! You're welcome ... scalability is one of the benefits of PCP over competing infrastructures for capturing and transporting performance data. > Now, I'm trying to run PCP on a number of machines with logging on a > special standalone node. I was able to start pmlogger for each of the > collector nodes on that standalone machine, yet I have some problems > disabling local pmlogger on the collector nodes. Commenting the > LOCALHOST entry in the control file doesn't seem to help. The docs > mention /etc/config/ files, yet they aren't present in the > installation (I'm using pcp-2.1.10-8 on RedHat 6.2). > > So, how do I disable the local pmlogger? This is a logic botch on our part ... it looks like the open source version of /etc/rc.d/init.d/pcp has carried over some version bridging logic from the IRIX version, but did not include the chkconfig controls. We'll need to investigate the correct fix for this. Pro tem, you have the following options. 0. ignore it ... the default pmlogger config means pmlogger spends most of its time asleep. 1. don't run pmcd at all on the local machine (the remote pmloggers do not need it), using chkconfig to turn pcp off, or 2. edit /etc/rc.d/init.d/pcp and remove or comment out the following block in _start_pmlogger(): if grep '^LOCALHOSTNAME[ ]*y[ ]' $PMLOGCTRL >/dev/null then ... elif grep "^$LOCALHOSTNAME[ ]*y[ ]" $PMLOGCTRL >/dev/null then ... else ... fi 3. edit /var/pcp/config/pmlogger/control and in the LOCALHOSTNAME line change -c config.default to -c bogus ... this will make pmlogger die immediately (but will send mail to root, so this is probably the least attractive option). From owner-pcp@oss.sgi.com Wed Nov 1 15:18:04 2000 Received: by oss.sgi.com id ; Wed, 1 Nov 2000 15:17:54 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:32276 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Wed, 1 Nov 2000 15:17:36 -0800 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via SMTP id PAA20571 for ; Wed, 1 Nov 2000 15:09:46 -0800 (PST) mail_from (markgw@sgi.com) Received: from sandpit.melbourne.sgi.com (sandpit.melbourne.sgi.com [134.14.55.132]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA23827; Thu, 2 Nov 2000 10:14:59 +1100 Date: Thu, 2 Nov 2000 10:14:58 +1100 (EST) From: Mark Goodwin X-Sender: markgw@sandpit.melbourne.sgi.com To: kenmcd@sgi.com cc: "Alexander L. Belikoff" , pcp@oss.sgi.com Subject: Re: local pmlogger disable In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing On Thu, 2 Nov 2000, Ken McDonell wrote: > > 1. don't run pmcd at all on the local machine (the remote pmloggers do > not need it), using chkconfig to turn pcp off, or clarification: you _do_ need pmcd running on every host that is being monitored with pmlogger. And on the host you are running pmlogger, you need pcp to be chkconfig "on" (otherwise the pcp rc script will not start any pmloggers). -- Mark From owner-pcp@oss.sgi.com Thu Nov 2 06:17:22 2000 Received: by oss.sgi.com id ; Thu, 2 Nov 2000 06:17:12 -0800 Received: from heffalump.fnal.gov ([131.225.9.20]:46844 "EHLO fnal.gov") by oss.sgi.com with ESMTP id ; Thu, 2 Nov 2000 06:16:48 -0800 Received: from fnal.gov ([131.225.80.75]) by smtp.fnal.gov (PMDF V6.0-24 #44770) with ESMTP id <0G3E00B46I8ZO6@smtp.fnal.gov> for pcp@oss.sgi.com; Thu, 02 Nov 2000 08:14:59 -0600 (CST) Date: Thu, 02 Nov 2000 08:14:59 -0600 From: Troy Dawson Subject: Re: local pmlogger disable Cc: pcp@oss.sgi.com Message-id: <3A0176E3.A0D7677A@fnal.gov> MIME-version: 1.0 X-Mailer: Mozilla 4.73 [en] (X11; U; Linux 2.2.16-3smp i686) Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7bit X-Accept-Language: en References: To: unlisted-recipients:; (no To-header on input) Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Ken McDonell wrote: > > Pro tem, you have the following options. > > 0. ignore it ... the default pmlogger config means pmlogger spends most > of its time asleep. > This is the option that I use. It's really not all that bad, plus, if we somehow lost all of our other remote loggers, we'd at least have a history of the load. If you are worried about it filling up vast amounts of diskspace, I just checked. For a machine that has been running with this on since April it took a total of 296k. (du -hs /var/log/pcp/pmlogger). This is on RedHat 6.1, running pcp 2.1.4. So I'd say leaving the local logger on is only going to cost you about 1 Meg a year. Troy Dawson -- __________________________________________________ Troy Dawson dawson@fnal.gov (630)840-6468 Fermilab ComputingDivision/OSS SCS Group __________________________________________________ From owner-pcp@oss.sgi.com Thu Nov 2 09:51:43 2000 Received: by oss.sgi.com id ; Thu, 2 Nov 2000 09:51:32 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:40739 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Thu, 2 Nov 2000 09:51:10 -0800 Received: from rattle.melbourne.sgi.com (rattle.melbourne.sgi.com [134.14.55.145]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id JAA04230 for ; Thu, 2 Nov 2000 09:43:19 -0800 (PST) mail_from (kenmcd@melbourne.sgi.com) Received: from localhost (kenmcd@localhost) by rattle.melbourne.sgi.com (SGI-8.9.3/8.9.3) with ESMTP id EAA85124 for ; Fri, 3 Nov 2000 04:49:51 +1100 (AEDT) X-Authentication-Warning: rattle.melbourne.sgi.com: kenmcd owned process doing -bs Date: Fri, 3 Nov 2000 04:49:51 +1100 From: Ken McDonell Reply-To: kenmcd@sgi.com To: pcp@oss.sgi.com Subject: Re: local pmlogger disable In-Reply-To: <3A0176E3.A0D7677A@fnal.gov> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Good points Troy. And we've fixed the problem in /etc/rc.d/init.d/pcp now, so this will be in the next source release at oss.sgi.com. If anyone needs the fix sooner, please contact me directly. On Thu, 2 Nov 2000, Troy Dawson wrote: > Ken McDonell wrote: > > > > Pro tem, you have the following options. > > > > 0. ignore it ... the default pmlogger config means pmlogger spends most > > of its time asleep. > > > > This is the option that I use. It's really not all that bad, plus, if we > somehow lost all of our other remote loggers, we'd at least have a history of > the load. > > If you are worried about it filling up vast amounts of diskspace, I just > checked. For a machine that has been running with this on since April it took > a total of 296k. (du -hs /var/log/pcp/pmlogger). This is on RedHat 6.1, > running pcp 2.1.4. > > So I'd say leaving the local logger on is only going to cost you about 1 Meg a > year. > > Troy Dawson > -- > __________________________________________________ > Troy Dawson dawson@fnal.gov (630)840-6468 > Fermilab ComputingDivision/OSS SCS Group > __________________________________________________ > From owner-pcp@oss.sgi.com Thu Nov 2 11:33:12 2000 Received: by oss.sgi.com id ; Thu, 2 Nov 2000 11:33:03 -0800 Received: from ex1.ncsa.uiuc.edu ([141.142.2.9]:49041 "EHLO ex1.ncsa.uiuc.edu") by oss.sgi.com with ESMTP id ; Thu, 2 Nov 2000 11:32:53 -0800 Received: from mx1.ncsa.uiuc.edu (mx1.ncsa.uiuc.edu [141.142.2.8]) by ex1.ncsa.uiuc.edu (8.11.0/8.11.0) with ESMTP id eA2JWqW18356 for ; Thu, 2 Nov 2000 13:32:52 -0600 (CST) X-Envelope-From: abailey@ncsa.uiuc.edu X-Envelope-To: Received: from osage.ncsa.uiuc.edu (osage.ncsa.uiuc.edu [141.142.2.56]) by mx1.ncsa.uiuc.edu (8.11.0/8.11.0) with ESMTP id eA2JWp312874 for ; Thu, 2 Nov 2000 13:32:52 -0600 (CST) Received: from localhost (abailey@localhost) by osage.ncsa.uiuc.edu (8.9.3/8.9.3) with ESMTP id NAA11489 for ; Thu, 2 Nov 2000 13:32:51 -0600 Date: Thu, 2 Nov 2000 13:32:51 -0600 (CST) From: Alan Bailey To: pcp@oss.sgi.com Subject: Suggested way of monitoring processes? Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing I'm looking to monitor a few things that aren't currently a part of the PCP pmdas. I want to check to make sure that a few important processes are running on the host being logged. These would be things like inetd, sshd, and other processes. I figure there might be two ways of doing this. 1. I could write my own pmdas for these, that just returns a 0 or 1 depending on if the process is running. This would take some time (for me) because I haven't done work with processes in C before or with writing pmdas. 2. I could utilize the current proc.psinfo.pid pmda that returns the whole process tree, and see if they are running by parsing through that. I also want to do a similar thing with NFS mounts, to make sure that remote disks are properly mounted. I'm leaning toward 1 just to keep everything being monitored on the same level, and so there isn't another layer needed (like in 2). However, it might be a lot of work. Any suggestions, or offers to write the pmda? ;-) Alan -- Alan Bailey From owner-pcp@oss.sgi.com Thu Nov 2 13:56:55 2000 Received: by oss.sgi.com id ; Thu, 2 Nov 2000 13:56:45 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:19325 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Thu, 2 Nov 2000 13:56:23 -0800 Received: from rattle.melbourne.sgi.com (rattle.melbourne.sgi.com [134.14.55.145]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id NAA16668 for ; Thu, 2 Nov 2000 13:48:33 -0800 (PST) mail_from (kenmcd@melbourne.sgi.com) Received: from localhost (kenmcd@localhost) by rattle.melbourne.sgi.com (SGI-8.9.3/8.9.3) with ESMTP id IAA01685; Fri, 3 Nov 2000 08:53:48 +1100 (AEDT) X-Authentication-Warning: rattle.melbourne.sgi.com: kenmcd owned process doing -bs Date: Fri, 3 Nov 2000 08:53:48 +1100 From: Ken McDonell Reply-To: kenmcd@sgi.com To: Alan Bailey cc: pcp@oss.sgi.com Subject: Re: Suggested way of monitoring processes? In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing The shping PMDA is a solution out of the box for this type of monitoring (it is a variant of 1.). At the end you'll find the help text and some sample values. Unfortunately this is part of the value-add part of PCP that is not open sourced ... it is in pcp-pro for Linux and SC4-PCP for IRIX. The creative de Bono solution is to use (or acquire) an SGI workstation running IRIX 6.5.5 or later) ... install pcp_eoe.sw.espping ... the espping PMDA is a clone of the shping PMDA that could be hijacked to do what you want. I would be willing to consider the case for moving the shping PMDA to open source if I had some justification from the community ... if this is of interest to you, drop me a note affressing questions like: - why you want it? - how would you use it? - is any SGI hardware involved? - would this be a make-or-break issue for you embracing PCP in your environment? On Thu, 2 Nov 2000, Alan Bailey wrote: > I'm looking to monitor a few things that aren't currently a part of the > PCP pmdas. I want to check to make sure that a few important processes > are running on the host being logged. These would be things like inetd, > sshd, and other processes. I figure there might be two ways of doing > this. > > 1. I could write my own pmdas for these, that just returns a 0 or 1 > depending on if the process is running. This would take some time (for > me) because I haven't done work with processes in C before or with writing > pmdas. > > 2. I could utilize the current proc.psinfo.pid pmda that returns the whole > process tree, and see if they are running by parsing through that. > > I also want to do a similar thing with NFS mounts, to make sure that > remote disks are properly mounted. > > I'm leaning toward 1 just to keep everything being monitored on the same > level, and so there isn't another layer needed (like in 2). However, it > might be a lot of work. Any suggestions, or offers to write the pmda? > ;-) $ pminfo -tT shping shping.status [command execution status for shping PMDA] Help: As each command is executed, the success or failure is encoded in shping.status, using the following values: -1 PMDA is initializing and command has not been run yet 0 command completed and exit status was 0 1 command completed and exit status was non-zero 2 command was run but terminated by a signal 3 command was run but did not complete (usually a timeout) 4 command was not run due to some system error or resource availability shping.error [command execution error code for shping PMDA] Help: As each command is executed, if there is a problem, the error code or cause is stored in shping.error. The interpretation of the value for shping.error depends on shping.status as follows: If shping.status is 1 (the command was run but returned a non-zero exit status) then shping.error is the exit status. If shping.status is 2 (the command was run but was terminated by a signal) then shping.error is the signal number. If shping.status is 3 (the command did not complete) then shping.error is a PCP error codes: see pmerr(1). Of particular relevance is -1008 (PM_ERR_TIMEOUT) when the command failed to complete in the time specified by shping.control.timeout. If shping.status is 4 (the commands was not run) then shping.error is the value of errno. Otherwise shping.error will be zero. shping.cmd [commands run by shping PMDA] Help: The text of each sh(1) command run by the shping PMDA. shping.time.real [elapsed time for a command] Help: This metric records the elapsed time in milliseconds for the most recent execution of each command to be run by the shping PMDA. Care should be used when interpreting the value if the corresponding value for shping.status is non-zero, as the command may not have run to completion. If the command timed out, shping.time.real will be -1. shping.time.cpu_usr [user mode CPU time for a command] Help: This metric records the user mode CPU time in milliseconds for the most recent execution of each command to be run by the shping PMDA. Care should be used when interpreting the value if the corresponding value for shping.status is non-zero, as the command may not have run to completion. If the command timed out, shping.time.cpu_usr will be -1. shping.time.cpu_sys [system mode CPU time for a command] Help: This metric records the system mode CPU time in milliseconds for the most recent execution of each command to be run by the shping PMDA. Care should be used when interpreting the value if the corresponding value for shping.status is non-zero, as the command may not have run to completion. If the command timed out, shping.time.cpu_sys will be -1. shping.control.numcmd [number of commands in the group to be run by the shping PMDA] Help: number of commands in the group to be run by the shping PMDA shping.control.cycles [number of times the command group has been run by the shping PMDA] Help: number of times the command group has been run by the shping PMDA shping.control.cycletime [shping PMDA cycle time] Help: All commands are run by the shping PMDA are executed one after another in a group, and the group is run once per "cycle" time. This metric reports the cycle time in seconds. The cycle time may be changed dynamically by modifying this metric with pmstore(1). shping.control.timeout [shping PMDA timeout period] Help: The number of seconds the shping PMDA is willing to wait before considering a single command to have timed out and killing it off. The time out interval may be changed dynamically by modifying this metric with pmstore(1). shping.control.debug [shping PMDA debug flag] Help: The debug flag for the shping PMDA (see pmdbg(1)). All trace and diagnostic files are created in /var/adm/pcplog (unless $PCP_LOGDIR is sent in the environment, see PMAPI(3)). The debug flags DBG_TRACE_APPL0 (2048) and DBG_TRACE_APPL1 (4096) may be used as follows: DBG_TRACE_APPL0 - additional trace messages associated with the running of each command appear in shping.log DBG_TRACE_APPL1 - the standard output and standard error of each command is appended to shping.out (instead of the default /dev/null) The debug flags may be changed dynamically by modifying this metric with pmstore(1), e.g. $ pmstore shping.control.debug 6144 would enable both of the diagnostic traces associated with DBG_TRACE_APPL0 and DBG_TRACE_APPL1. $ pminfo -f shping shping.status inst [0 or "null"] value 0 inst [1 or "date"] value 0 inst [2 or "sum"] value 0 inst [3 or "cc"] value 0 inst [4 or "dns"] value 0 inst [5 or "dns-self"] value 0 inst [6 or "dns-err"] value 0 inst [7 or "ypserv"] value 1 inst [8 or "rpcbind"] value 0 inst [9 or "smtp"] value 0 inst [10 or "nntp"] value 0 inst [11 or "hippi"] value 0 inst [12 or "autofsd"] value 0 shping.error inst [0 or "null"] value 0 inst [1 or "date"] value 0 inst [2 or "sum"] value 0 inst [3 or "cc"] value 0 inst [4 or "dns"] value 0 inst [5 or "dns-self"] value 0 inst [6 or "dns-err"] value 0 inst [7 or "ypserv"] value 1 inst [8 or "rpcbind"] value 0 inst [9 or "smtp"] value 0 inst [10 or "nntp"] value 0 inst [11 or "hippi"] value 0 inst [12 or "autofsd"] value 0 shping.cmd inst [0 or "null"] value "exit 0" inst [1 or "date"] value "/sbin/date" inst [2 or "sum"] value "sum /unix" inst [3 or "cc"] value "cd /tmp; rm -f $$.[oc] $$; echo "main(){printf(\"g'day world\\\\n\");}" >/tmp/$$.c; cc -o $$ $$.c; ./$$; rm -f $$.[oc] $$" inst [4 or "dns"] value "nslookup - 134.14.52.130 ; Thu, 2 Nov 2000 16:08:36 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:53556 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Thu, 2 Nov 2000 16:08:18 -0800 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via SMTP id QAA09709 for ; Thu, 2 Nov 2000 16:00:27 -0800 (PST) mail_from (kaos@melbourne.sgi.com) Received: from kao2.melbourne.sgi.com (kao2.melbourne.sgi.com [134.14.55.180]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA02857; Fri, 3 Nov 2000 11:06:57 +1100 X-Mailer: exmh version 2.1.1 10/15/1999 From: Keith Owens To: Alan Bailey cc: pcp@oss.sgi.com Subject: Re: Suggested way of monitoring processes? In-reply-to: Your message of "Thu, 02 Nov 2000 13:32:51 MDT." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 03 Nov 2000 11:06:56 +1100 Message-ID: <1787.973210016@kao2.melbourne.sgi.com> Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing On Thu, 2 Nov 2000 13:32:51 -0600 (CST), Alan Bailey wrote: >I'm looking to monitor a few things that aren't currently a part of the >PCP pmdas. I want to check to make sure that a few important processes >are running on the host being logged. These would be things like inetd, >sshd, and other processes. I figure there might be two ways of doing >this. > >2. I could utilize the current proc.psinfo.pid pmda that returns the whole >process tree, and see if they are running by parsing through that. I am reliably informed that pmie is designed to do this type of work. You write pmie rules that check the resources you are interested in and issue an alarm if they are missing. From owner-pcp@oss.sgi.com Thu Nov 2 23:13:56 2000 Received: by oss.sgi.com id ; Thu, 2 Nov 2000 23:13:47 -0800 Received: from tah14.ctt.cz ([194.108.115.182]:54541 "EHLO arthur.plbohnice.cz") by oss.sgi.com with ESMTP id ; Thu, 2 Nov 2000 23:13:26 -0800 Received: (from lemming@localhost) by arthur.plbohnice.cz (8.9.3/8.10.1) id IAA06305 for pcp@oss.sgi.com; Fri, 3 Nov 2000 08:13:10 +0100 Date: Fri, 3 Nov 2000 08:13:10 +0100 From: The Lemming To: pcp@oss.sgi.com Subject: Re: Suggested way of monitoring processes? Message-ID: <20001103081310.A6224@arthur.plbohnice.cz> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: ; from abailey@ncsa.uiuc.edu on Thu, Nov 02, 2000 at 01:32:51PM -0600 Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing > I'm looking to monitor a few things that aren't currently a part of the > PCP pmdas. I want to check to make sure that a few important processes > are running on the host being logged. These would be things like inetd, > sshd, and other processes. I figure there might be two ways of doing > this. I must say that I don't use PCP for this. We have a web portal, so we use PCP only for performance monitoring. For availability monitoring, we use Spong. It not only checks for processes, but also for disk, CPU load, ... Other part of it does remote monitoring that checks ping, http server function (via trying GET), smtp server (checks for welcome message) and many others. Spong allows you to define whom to page (send email) for which server and/or service, even depending on the time of the event, allows you to delay message for some time to prevent false alarms, it can send alarm message repeatedly until problem is acknowledged via interface and so on. (I didn't investigated pmie, so I don't know whether it has these functions.) There are more tools like that, I know of three which works well: Spong - http://spong.sourceforge.net BigBrother - http://www.bb4.com NetSaint - http://www.netsaint.org So check them out if they provide what you need or not. Michal Kara From owner-pcp@oss.sgi.com Sun Nov 5 14:20:14 2000 Received: by oss.sgi.com id ; Sun, 5 Nov 2000 14:20:04 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:58635 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Sun, 5 Nov 2000 14:19:46 -0800 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via SMTP id OAA16735 for ; Sun, 5 Nov 2000 14:11:54 -0800 (PST) mail_from (nathans@wobbly.melbourne.sgi.com) Received: from wobbly.melbourne.sgi.com (wobbly.melbourne.sgi.com [134.14.55.135]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA18534; Mon, 6 Nov 2000 09:18:16 +1100 Received: (from nathans@localhost) by wobbly.melbourne.sgi.com (980427.SGI.8.8.8/980728.SGI.AUTOCF) id JAA15777; Mon, 6 Nov 2000 09:17:50 +1100 (EDT) From: "Nathan Scott" Message-Id: <10011060917.ZM115267@wobbly.melbourne.sgi.com> Date: Mon, 6 Nov 2000 09:17:47 -0400 In-Reply-To: The Lemming "Re: Suggested way of monitoring processes?" (Nov 3, 8:13am) References: <20001103081310.A6224@arthur.plbohnice.cz> X-Mailer: Z-Mail (3.2.3 08feb96 MediaMail) To: The Lemming , pcp@oss.sgi.com Subject: Re: Suggested way of monitoring processes? Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing hi, On Nov 3, 8:13am, The Lemming wrote: > Subject: Re: Suggested way of monitoring processes? > ... > I must say that I don't use PCP for this. We have a web portal, so we use PCP > only for performance monitoring. For availability monitoring, we use Spong. It > not only checks for processes, but also for disk, CPU load, ... Other part of it > does remote monitoring that checks ping, http server function (via trying GET), > smtp server (checks for welcome message) and many others. > > Spong allows you to define whom to page (send email) for which server and/or > service, even depending on the time of the event, allows you to delay message > for some time to prevent false alarms, it can send alarm message repeatedly > until problem is acknowledged via interface and so on. (I didn't investigated > pmie, so I don't know whether it has these functions.) > ... Yes, pmie has all of these functions. Used in conjuction with the (not yet opensource, but maybe one day?) shping PMDA, or a more specific PMDA like httpd/cisco/..., its very useful for remote service availability and response-time monitoring. It has been used in base-IRIX to do exactly that for some time now. cheers. NAME pmie - inference engine for performance metrics DESCRIPTION pmie accepts a collection of arithmetic, logical, and rule expressions to be evaluated at specified frequencies. The base data for the expressions consists of performance metrics values delivered in real-time from any host running the Performance Metrics Collection Daemon (PMCD), or using historical data from Performance Co-Pilot (PCP) archive logs. As well as computing arithmetic and logical values, pmie can execute actions (popup alarms, write system log messages, and launch programs) in response to specified conditions. Such actions are extremely useful in detecting, monitoring and correcting performance related problems. -- Nathan From owner-pcp@oss.sgi.com Mon Nov 6 04:43:38 2000 Received: by oss.sgi.com id ; Mon, 6 Nov 2000 04:43:28 -0800 Received: from tah14.ctt.cz ([194.108.115.182]:31507 "EHLO arthur.plbohnice.cz") by oss.sgi.com with ESMTP id ; Mon, 6 Nov 2000 04:43:18 -0800 Received: (from lemming@localhost) by arthur.plbohnice.cz (8.9.3/8.10.1) id NAA24006 for pcp@oss.sgi.com; Mon, 6 Nov 2000 13:42:36 +0100 Date: Mon, 6 Nov 2000 13:42:36 +0100 From: Michal Kara To: pcp@oss.sgi.com Subject: PCPMON 1.3.0 released Message-ID: <20001106134236.A23952@arthur.plbohnice.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Hello all! PCPMON 1.3.0 was finally released. It includes archive mode (which was in 1.2.95 too) screen-shot saving (in PNG format) and command-line mode (saves first screenshot). Numerous bugs were fixed, too. Download from http://k332.feld.cvut.cz/~lemming/projects/pcpmon-1.3.0.tar.gz and test, please :) Michal Kara From owner-pcp@oss.sgi.com Mon Nov 6 15:23:24 2000 Received: by oss.sgi.com id ; Mon, 6 Nov 2000 15:23:14 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:3923 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Mon, 6 Nov 2000 15:22:57 -0800 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via SMTP id PAA19997 for ; Mon, 6 Nov 2000 15:15:04 -0800 (PST) mail_from (nathans@wobbly.melbourne.sgi.com) Received: from wobbly.melbourne.sgi.com (wobbly.melbourne.sgi.com [134.14.55.135]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA27391; Tue, 7 Nov 2000 10:20:22 +1100 Received: (from nathans@localhost) by wobbly.melbourne.sgi.com (980427.SGI.8.8.8/980728.SGI.AUTOCF) id KAA14132; Tue, 7 Nov 2000 10:20:19 +1100 (EDT) From: "Nathan Scott" Message-Id: <10011071020.ZM117549@wobbly.melbourne.sgi.com> Date: Tue, 7 Nov 2000 10:20:18 -0400 In-Reply-To: bobyetman@att.net "Loadavg calculation" (Nov, 5 12:55pm) X-Mailer: Z-Mail (3.2.3 08feb96 MediaMail) To: bobyetman@att.net Subject: Re: Loadavg calculation Cc: linux-kernel@vger.kernel.org, pcp@oss.sgi.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing hi, As you've suggested, you'd be better off not using the load average but rather some other measure (or combination of measures) to figure out when you have enough spare cycles or bandwidth. The "pmie" tool might be useful to you - here's a contrived example I just knocked up (instead of a "print" you'd want to run your program via the "shell" keyword) with an occassional artificial load in the background. kernel.all.cpu.idle is aggregate idle time across all cpus. pmie converts it to a rate (#idle milliseconds / 8 seconds) so it will always have a value between 0 (no idle time) and 1 (lots of idle time). $ pmie -t 8sec -v ( kernel.all.cpu.idle > 0.5 ) -> print "start a new job"; ^D expr_1: ? Tue Nov 7 09:33:36 2000: start a new job expr_1: true Tue Nov 7 09:33:44 2000: start a new job expr_1: true Tue Nov 7 09:33:52 2000: start a new job expr_1: true expr_1: false expr_1: false expr_1: false Tue Nov 7 09:34:24 2000: start a new job expr_1: true Tue Nov 7 09:34:32 2000: start a new job expr_1: true expr_1: false expr_1: false Tue Nov 7 09:34:56 2000: start a new job expr_1: true pmie is one of the gpl'd pcp tools which you can get from the sgi oss site... hope its useful to you. mailto the pcp list if you need any more info. cheers. bobyetman@att.net wrote: > > I'm working a project a work that is using Linux to run some very > math-intensive calculations. One of the things we do is use the 1-minute > loadavg to determine how busy the machine is and can we fire off another > program to do more calculations. However, there's a problem with that. > > Because it's a 1 minute load average, there's quite a bit of lag time from > when 1 program finishes until the loadavg goes down below a threshold for > our control mechanism to fire off another program. > > Let me give an example (all on a 1-cpu PC) > > HH:MM:SS > 00:00:00 fire off 4 programs > 00:01:00 loadavg goes up to 4 > 00:01:30 3 of the 4 programs finish loadavg still at 4 > 00:02:20 load avg goes down to 1, below our threshold > 00:02:21 we fire off 3 more programs. > > We'd like to reduce that almost 50 second lag time. Is it possible, in > user-space, to duplicate the loadavg calculation period, say to a 15 > second load average, using the information in /proc? > > The other option we looked at, besides using loadavg, was using idle pct%, > but if I read the source for top right, involves reading the entire > process table to calculate clock ticks used and then figuring out how many > weren't used. > > Ideas, opinions welcome. Yes, I read the list, so either respond direct > to me, or to the list. > > bobyetman@att.net (Robert A. Yetman) > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > Please read the FAQ at http://www.tux.org/lkml/ -- Nathan -- Nathan From owner-pcp@oss.sgi.com Tue Nov 7 10:31:40 2000 Received: by oss.sgi.com id ; Tue, 7 Nov 2000 10:31:21 -0800 Received: from ex1.ncsa.uiuc.edu ([141.142.2.9]:7820 "EHLO ex1.ncsa.uiuc.edu") by oss.sgi.com with ESMTP id ; Tue, 7 Nov 2000 10:31:03 -0800 Received: from mx1.ncsa.uiuc.edu (mx1.ncsa.uiuc.edu [141.142.2.8]) by ex1.ncsa.uiuc.edu (8.11.0/8.11.0) with ESMTP id eA7IV1W24482 for ; Tue, 7 Nov 2000 12:31:01 -0600 (CST) X-Envelope-From: abailey@ncsa.uiuc.edu X-Envelope-To: Received: from osage.ncsa.uiuc.edu (osage.ncsa.uiuc.edu [141.142.2.56]) by mx1.ncsa.uiuc.edu (8.11.0/8.11.0) with ESMTP id eA7IV1322592 for ; Tue, 7 Nov 2000 12:31:01 -0600 (CST) Received: from localhost (abailey@localhost) by osage.ncsa.uiuc.edu (8.9.3/8.9.3) with ESMTP id MAA21497 for ; Tue, 7 Nov 2000 12:31:00 -0600 Date: Tue, 7 Nov 2000 12:30:59 -0600 (CST) From: Alan Bailey To: pcp@oss.sgi.com Subject: Re: Suggested way of monitoring processes? In-Reply-To: <10011060917.ZM115267@wobbly.melbourne.sgi.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing I've been messing around with pmie now. As a first try, I'm writing a little rule to monitor an sshd process. Here it is: delta = 3 seconds; sshd = some_inst match_inst "sshd" ( proc.psinfo.pid > 0 ) -> shell 60 seconds "echo 'it exists' | mail -s 'it exists' abailey" I've been running pmie from the command line, and the output sometimes looks like this: [root@lanner pmie]# pmie -v ./config.default sshd: true sshd: ? sshd: ? sshd: true sshd: true sshd: true sshd: true sshd: true <- I killed the process here sshd: ? sshd: false sshd: false sshd: false sshd: false <- I started the process again here sshd: ? sshd: true sshd: true sshd: ? sshd: true So, there are ?'s appearing in places where I think they shouldn't. First, do ?'s occur when the instance that I'm looking for does not exist? Why is there always one during each transition, and why do they appear in the middle of streams of 'true's? Does anyone have any insight in this problem, and possibly how I could get around it? Alan On Mon, 6 Nov 2000, Nathan Scott wrote: > hi, > > On Nov 3, 8:13am, The Lemming wrote: > > Subject: Re: Suggested way of monitoring processes? > > ... > > I must say that I don't use PCP for this. We have a web portal, so we use PCP > > only for performance monitoring. For availability monitoring, we use Spong. It > > not only checks for processes, but also for disk, CPU load, ... Other part of it > > does remote monitoring that checks ping, http server function (via trying GET), > > smtp server (checks for welcome message) and many others. > > > > Spong allows you to define whom to page (send email) for which server and/or > > service, even depending on the time of the event, allows you to delay message > > for some time to prevent false alarms, it can send alarm message repeatedly > > until problem is acknowledged via interface and so on. (I didn't investigated > > pmie, so I don't know whether it has these functions.) > > ... > > Yes, pmie has all of these functions. Used in conjuction with > the (not yet opensource, but maybe one day?) shping PMDA, or > a more specific PMDA like httpd/cisco/..., its very useful for > remote service availability and response-time monitoring. It > has been used in base-IRIX to do exactly that for some time now. > > cheers. > > > NAME > pmie - inference engine for performance metrics > > DESCRIPTION > pmie accepts a collection of arithmetic, logical, and rule expressions to > be evaluated at specified frequencies. The base data for the expressions > consists of performance metrics values delivered in real-time from any > host running the Performance Metrics Collection Daemon (PMCD), or using > historical data from Performance Co-Pilot (PCP) archive logs. > > As well as computing arithmetic and logical values, pmie can execute > actions (popup alarms, write system log messages, and launch programs) in > response to specified conditions. Such actions are extremely useful in > detecting, monitoring and correcting performance related problems. > > > -- > Nathan > -- Alan Bailey From owner-pcp@oss.sgi.com Tue Nov 7 15:04:42 2000 Received: by oss.sgi.com id ; Tue, 7 Nov 2000 15:04:32 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:41223 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Tue, 7 Nov 2000 15:04:08 -0800 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via SMTP id OAA10266 for ; Tue, 7 Nov 2000 14:56:16 -0800 (PST) mail_from (nathans@wobbly.melbourne.sgi.com) Received: from wobbly.melbourne.sgi.com (wobbly.melbourne.sgi.com [134.14.55.135]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA04772; Wed, 8 Nov 2000 10:02:48 +1100 Received: (from nathans@localhost) by wobbly.melbourne.sgi.com (980427.SGI.8.8.8/980728.SGI.AUTOCF) id KAA12829; Wed, 8 Nov 2000 10:02:47 +1100 (EDT) From: "Nathan Scott" Message-Id: <10011081002.ZM116762@wobbly.melbourne.sgi.com> Date: Wed, 8 Nov 2000 10:02:45 -0400 In-Reply-To: Alan Bailey "Re: Suggested way of monitoring processes?" (Nov 7, 12:30pm) References: X-Mailer: Z-Mail (3.2.3 08feb96 MediaMail) To: Alan Bailey Subject: Re: Suggested way of monitoring processes? Cc: pcp@oss.sgi.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing hi Alan, On Nov 7, 12:30pm, Alan Bailey wrote: > Subject: Re: Suggested way of monitoring processes? > I've been messing around with pmie now. As a first try, I'm writing a > little rule to monitor an sshd process. Here it is: > very nice! > delta = 3 seconds; > sshd = > some_inst match_inst "sshd" ( > proc.psinfo.pid > 0 > ) -> shell 60 seconds "echo 'it exists' | mail -s 'it exists' abailey" > > I've been running pmie from the command line, and the output sometimes > looks like this: > [snip] > > So, there are ?'s appearing in places where I think they shouldn't. > First, do ?'s occur when the instance that I'm looking for does not exist? > Why is there always one during each transition, and why do they appear in > the middle of streams of 'true's? > > Does anyone have any insight in this problem, and possibly how I could get > around it? > pmie -v prints a '?' when it believes it doesn't have enough information to completely evaluate the expression. i've usually come across it when evaluating counter metrics (rate conversion requires two values), but that isn't the case here. in this case, what i think is happening (from some experiments using "sleep" in place of "sshd") is that whenever the set of instances coming back from match_inst changes, pmie throws its hands up in disgust, resets itself for the next metric fetch and gives up on the current one. this (i believe, Ken knows this code better than i do ;) is why we get one '?' after each state change (sshd stop/start) and then good data. i don't really agree this is the correct behavior for this situation, but i'll defer to Ken - perhaps there's something i've missed. for the second case where you see a '?' in a string of 'true's - the only way I could reproduce that one was to have one "sleep" running and then to start another (which is the same problem as above - the set of instances coming back from match_inst changes) - is it possible you had one sshd running & then started another? so, i don't think theres any situation where pmie is lying to you, its just a little indecisive at times :-)... it may be possible to improve this. for the purpose of tracking long-running processes this shouldn't be too much of a problem (with relatively small metric fetch deltas), but its certainly annoying though. cheers. -- Nathan From owner-pcp@oss.sgi.com Tue Nov 7 17:02:24 2000 Received: by oss.sgi.com id ; Tue, 7 Nov 2000 17:02:04 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:8495 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Tue, 7 Nov 2000 17:01:41 -0800 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via SMTP id QAA00481 for ; Tue, 7 Nov 2000 16:53:49 -0800 (PST) mail_from (markgw@sgi.com) Received: from sandpit.melbourne.sgi.com (sandpit.melbourne.sgi.com [134.14.55.132]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA05830; Wed, 8 Nov 2000 11:59:00 +1100 Date: Wed, 8 Nov 2000 11:58:59 +1100 (EST) From: Mark Goodwin X-Sender: markgw@sandpit.melbourne.sgi.com To: Michal Kara cc: pcp@oss.sgi.com Subject: Re: PCPMON 1.3.0 released In-Reply-To: <20001106134236.A23952@arthur.plbohnice.cz> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing On Mon, 6 Nov 2000, Michal Kara wrote: > PCPMON 1.3.0 was finally released. It includes archive mode (which was in > 1.2.95 too) screen-shot saving (in PNG format) and command-line mode (saves > first screenshot). Numerous bugs were fixed, too. Download from > http://k332.feld.cvut.cz/~lemming/projects/pcpmon-1.3.0.tar.gz and test, please Michal, I downloaded pcpmon 1.3.0 and had some compilation problems: sherman 20% ./configure loading cache ./config.cache ... [much deleted for brevity] ... checking for gdImagePng in -lgd... no configure: error: libgd not found sherman 21% rpm -qal | fgrep libgd. /usr/lib/libgd.so.1.2 /usr/lib/libgd.so sherman 22% rpm -qal | fgrep libpng. /usr/lib/libpng.a /usr/lib/libpng.so /usr/man/man3/libpng.3.gz /usr/doc/libpng-1.0.5/libpng.txt /usr/lib/libpng.so.2.1.0.5 So I have both gd and png, but configure still failed. After running "./configure --disable-gd; make" I now get errors with missing xml headers: make[2]: Entering directory `/build/markgw/isms/pcpmon/pcpmon-1.3.0/src' gcc -DHAVE_CONFIG_H -I. -I. -I.. -I/usr/lib/glib/include -I/usr/X11R6/include -g -O2 -Wall -DDISABLEGD -c main.c In file included from main.c:16: file.h:12: libxml/parser.h: No such file or directory In file included from main.c:17: display.h:12: libxml/parser.h: No such file or directory ... and more errors follow. This is on a Redhat6.2 system, with "everything" installed. I seem to have /usr/include/gnome-xml/parser.h but don't have libxml/parser.h. So where do I get the required xml stuff? Also, on your PCPMON homepage at http://k332.feld.cvut.cz/~lemming/projects/pcpmon.html you asked "if you create RPMs, let me know". You can set up your src tree using a tool called "gensrc", and then easily build both src and bin RPMs. The gensrc home page is http://oss.sgi.com/projects/gensrc thanks -- Mark From owner-pcp@oss.sgi.com Tue Nov 7 23:10:26 2000 Received: by oss.sgi.com id ; Tue, 7 Nov 2000 23:10:17 -0800 Received: from tah14.ctt.cz ([194.108.115.182]:61705 "EHLO arthur.plbohnice.cz") by oss.sgi.com with ESMTP id ; Tue, 7 Nov 2000 23:10:01 -0800 Received: (from lemming@localhost) by arthur.plbohnice.cz (8.9.3/8.10.1) id IAA02497 for pcp@oss.sgi.com; Wed, 8 Nov 2000 08:09:30 +0100 Date: Wed, 8 Nov 2000 08:09:30 +0100 From: Michal Kara To: pcp@oss.sgi.com Subject: Re: PCPMON 1.3.0 released Message-ID: <20001108080930.A2458@arthur.plbohnice.cz> References: <20001106134236.A23952@arthur.plbohnice.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: ; from markgw@sgi.com on Wed, Nov 08, 2000 at 11:58:59AM +1100 Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing > So I have both gd and png, but configure still failed. But is there the gdImagePng function? There are more versions of gd library. The older could do only GIFs but in the newer versions GIF creation capability was dropped due to (well known) licensing problems and now it can create PNGs. Please send me config.log from the failed attempt. > "./configure --disable-gd; make" I now get errors with missing xml headers: > > This is on a Redhat6.2 system, with "everything" installed. I seem to have > /usr/include/gnome-xml/parser.h but don't have libxml/parser.h. > So where do I get the required xml stuff? This suggests you have 1.x version of libxml library. New PCPMON now utilizes version 2.x. It seems I will have either improve configure to not accept 1.x version or make PCPMON able to cope with both versions of the library :( > Also, on your PCPMON homepage at > http://k332.feld.cvut.cz/~lemming/projects/pcpmon.html > you asked "if you create RPMs, let me know". You can set up your > src tree using a tool called "gensrc", and then easily build both > src and bin RPMs. The gensrc home page is http://oss.sgi.com/projects/gensrc OK, thanks. Will this handle RPM dependencies too? Michal From owner-pcp@oss.sgi.com Wed Nov 8 02:11:48 2000 Received: by oss.sgi.com id ; Wed, 8 Nov 2000 02:11:39 -0800 Received: from mail.ole.es ([195.235.51.25]:58669 "EHLO mail.ole.es") by oss.sgi.com with ESMTP id ; Wed, 8 Nov 2000 02:11:15 -0800 Received: from corona.ole.es (corona [194.30.23.13]) by mail.ole.es (8.9.1/8.9.1) with ESMTP id LAA14813923; Wed, 8 Nov 2000 11:16:16 +0100 (CET) Date: Wed, 8 Nov 2000 11:19:48 +0100 (CET) From: =?ISO-8859-1?Q?Vicente_Arteaga_G=F3mez?= X-Sender: se03726@corona.ole.es To: Michal Kara cc: pcp@oss.sgi.com Subject: Re: PCPMON 1.3.0 released In-Reply-To: <20001108080930.A2458@arthur.plbohnice.cz> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Hi, You need gd version >= 1.6 for Png support. And yes, Gif generation was dropped on following versions. On Wed, 8 Nov 2000, Michal Kara wrote: > > So I have both gd and png, but configure still failed. > > But is there the gdImagePng function? There are more versions of gd library. > The older could do only GIFs but in the newer versions GIF creation capability > was dropped due to (well known) licensing problems and now it can create PNGs. > Please send me config.log from the failed attempt. > > > "./configure --disable-gd; make" I now get errors with missing xml headers: > > > > This is on a Redhat6.2 system, with "everything" installed. I seem to have > > /usr/include/gnome-xml/parser.h but don't have libxml/parser.h. > > So where do I get the required xml stuff? > > This suggests you have 1.x version of libxml library. New PCPMON now utilizes > version 2.x. It seems I will have either improve configure to not accept 1.x > version or make PCPMON able to cope with both versions of the library :( > > > Also, on your PCPMON homepage at > > http://k332.feld.cvut.cz/~lemming/projects/pcpmon.html > > you asked "if you create RPMs, let me know". You can set up your > > src tree using a tool called "gensrc", and then easily build both > > src and bin RPMs. The gensrc home page is http://oss.sgi.com/projects/gensrc > > OK, thanks. Will this handle RPM dependencies too? > > Michal > From owner-pcp@oss.sgi.com Wed Nov 8 15:32:51 2000 Received: by oss.sgi.com id ; Wed, 8 Nov 2000 15:32:41 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:29004 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Wed, 8 Nov 2000 15:32:20 -0800 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via SMTP id PAA15901 for ; Wed, 8 Nov 2000 15:24:28 -0800 (PST) mail_from (markgw@sgi.com) Received: from sandpit.melbourne.sgi.com (sandpit.melbourne.sgi.com [134.14.55.132]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA13761; Thu, 9 Nov 2000 10:29:40 +1100 Date: Thu, 9 Nov 2000 10:29:40 +1100 (EST) From: Mark Goodwin X-Sender: markgw@sandpit.melbourne.sgi.com To: Michal Kara cc: pcp@oss.sgi.com Subject: Re: PCPMON 1.3.0 released In-Reply-To: <20001108080930.A2458@arthur.plbohnice.cz> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing On Wed, 8 Nov 2000, Michal Kara wrote: > > So I have both gd and png, but configure still failed. > > But is there the gdImagePng function? There are more versions of gd library. > The older could do only GIFs but in the newer versions GIF creation capability > was dropped due to (well known) licensing problems and now it can create PNGs. > Please send me config.log from the failed attempt. gdImageGif is there, but gdImagePng is not. I would suggest you add conditional configure stuff to use gdImageGif if gdImagePng is not present. If the gif function is in a shared lib you happen to be linking with, then it's not your licensing problem ... Here's the relevant bit of the log: ... configure:2150: checking for gdImagePng in -lgd configure:2169: gcc -o conftest -g -O2 -Wall conftest.c -lgd -lz -lxml -lpcp -lgd -lpng 1>&5 /tmp/ccOsX1IY.o: In function `main': /home/markgw/isms/pcpmon/pcpmon-1.3.0/configure:2165: undefined reference to `gdImagePng' collect2: ld returned 1 exit status configure: failed program was: #line 2158 "configure" #include "confdefs.h" /* Override any gcc2 internal prototype to avoid an error. */ /* We use char because int might match the return type of a gcc2 builtin and then its argument prototype would still apply. */ char gdImagePng(); int main() { gdImagePng() ; return 0; } > > > "./configure --disable-gd; make" I now get errors with missing xml headers: > > > This suggests you have 1.x version of libxml library. New PCPMON now utilizes > version 2.x. It seems I will have either improve configure to not accept 1.x > version or make PCPMON able to cope with both versions of the library :( Is libxml v2 available somewhere? > > > Also, on your PCPMON homepage at > > http://k332.feld.cvut.cz/~lemming/projects/pcpmon.html > > you asked "if you create RPMs, let me know". You can set up your > > src tree using a tool called "gensrc", and then easily build both > > src and bin RPMs. The gensrc home page is http://oss.sgi.com/projects/gensrc > > OK, thanks. Will this handle RPM dependencies too? RPM picks up all shared lib deps automatically. In addition, you can add dependencies on specific packages in your spec. If needed, you can also specify a dependency on a particular version of a package (or range of versions). Here's a simple gensrc example: sherman 16% gensrc pcpmon gensrc: done. See "pcpmon/Porting-Guide" for further instructions. sherman 17% cd pcpmon sherman 18% ./Makepkgs == configure, log is Logs/configure == default, log is Logs/default == dist, log is Logs/dist Wrote: /home/markgw/pcpmon/build/rpm/pcpmon-1.0.0-1.src.rpm Wrote: /home/markgw/pcpmon/build/rpm/pcpmon-1.0.0-1.i386.rpm Wrote: /home/markgw/pcpmon/build/tar/pcpmon-1.0.0.tar.gz sherman 19% head build/rpm/pcpmon.spec.in # Name: @package_name@ Version: @package_version@ Release: @package_release@ Distribution: @package_distribution@ Packager: @package_builder@ BuildRoot: @build_root@ Source: @package_name@-@package_version@.src.tar.gz Summary: PCPMON is a package for doing something useful. ... You should edit the VERSION file to specify the version of the package to build. To tell RPM that pcpmon requires pcp, gd-devel with version greater than 1.3 and libxml-devel with version greater than 1.8, edit build/rpm/pcpmon.spec.in and add something like the following after the "Source:" tag :- Requires: pcp >= 2.1.8, gd-devel > 1.3, libxml-devel > 1.8 -- Mark From owner-pcp@oss.sgi.com Wed Nov 8 22:10:13 2000 Received: by oss.sgi.com id ; Wed, 8 Nov 2000 22:10:03 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:1596 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Wed, 8 Nov 2000 22:09:47 -0800 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via SMTP id WAA07052 for ; Wed, 8 Nov 2000 22:01:55 -0800 (PST) mail_from (kenmcd@melbourne.sgi.com) From: kenmcd@melbourne.sgi.com Received: from [192.82.201.242] ([192.82.201.242]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA16224; Thu, 9 Nov 2000 17:08:16 +1100 Date: Thu, 9 Nov 2000 17:10:25 +1100 (EST) Reply-To: kenmcd@melbourne.sgi.com To: Alan Bailey , Nathan Scott cc: pcp@oss.sgi.com Subject: Re: Suggested way of monitoring processes? In-Reply-To: <10011081002.ZM116762@wobbly.melbourne.sgi.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing On Tue, 7 Nov 2000, Alan Bailey wrote: > > I've been messing around with pmie now. As a first try, I'm writing a > little rule to monitor an sshd process. Here it is: > > delta = 3 seconds; > sshd = > some_inst match_inst "sshd" ( > proc.psinfo.pid > 0 > ) -> shell 60 seconds "echo 'it exists' | mail -s 'it exists' abailey" > > ... > > So, there are ?'s appearing in places where I think they shouldn't. > First, do ?'s occur when the instance that I'm looking for does not exist? > Why is there always one during each transition, and why do they appear in > the middle of streams of 'true's? > > Does anyone have any insight in this problem, and possibly how I could get > around it? On Wed, 8 Nov 2000, Nathan Scott wrote: > > hi Alan, > ... > in this case, what i think is happening (from some experiments > using "sleep" in place of "sshd") is that whenever the set of > instances coming back from match_inst changes, pmie throws its > hands up in disgust, resets itself for the next metric fetch > and gives up on the current one. > > this (i believe, Ken knows this code better than i do ;) is why > we get one '?' after each state change (sshd stop/start) and then > good data. i don't really agree this is the correct behavior for > this situation, but i'll defer to Ken - perhaps there's something > i've missed. pmie is a brave little camper, but sometimes the semantics of the rule evaluation are so complex that it must abandon partial results and cached state and start again when the set of instances for a particular metric is found to change (this is the technical version of "throws its hands up in disgust"). I've tried more aggressive schemes but they unfortunately produce incorrect results for more complex predicates and/or metrics with different semantics. In Alan's case true - means I am sure the predicate is true false - means I am sure the predicate is false ? - means I am not sure Fortunately in all the circumstances I've analyzed "not sure" is transient and the vast majority of the rule evaluations unambiguously return either true or false. > so, i don't think theres any situation where pmie is lying to you, > its just a little indecisive at times :-)... it may be possible to > improve this. for the purpose of tracking long-running processes > this shouldn't be too much of a problem (with relatively small > metric fetch deltas), but its certainly annoying though. Remember the design goal for PCP 7+ years ago was multiple distributed hosts each with 100+ CPUs and 1+ Terabyte of disk ... to manage this sort of environment we've consistently opted for scalability over micro accuracy, because these large and complex systems cannot be turned around quickly ... in this environment, instance domains change infrequently, and so the protocols and architecure are biased towards this state of affairs. If the number and/or pids of processes matching the name sshd varies dramatically in your production environment, then there are other solutions that can be applied (this is an ideal fit for the shping PMDA) ... let me know if this is the case. Note that if _all_ the sshd processes die (the case that is really of interest I presume) the _worst_ sequence you will see is: true ? false so the detection is delayed for at most two pmie rule evaluation intervals` and on average 1.5 times the evaluation interval, or 4.5 sec in your example). From owner-pcp@oss.sgi.com Tue Nov 14 09:45:28 2000 Received: by oss.sgi.com id ; Tue, 14 Nov 2000 09:45:08 -0800 Received: from ex1.ncsa.uiuc.edu ([141.142.2.9]:48842 "EHLO ex1.ncsa.uiuc.edu") by oss.sgi.com with ESMTP id ; Tue, 14 Nov 2000 09:44:57 -0800 Received: from mx1.ncsa.uiuc.edu (mx1.ncsa.uiuc.edu [141.142.2.8]) by ex1.ncsa.uiuc.edu (8.11.0/8.11.0) with ESMTP id eAEHiuW18725 for ; Tue, 14 Nov 2000 11:44:56 -0600 (CST) X-Envelope-From: abailey@ncsa.uiuc.edu X-Envelope-To: Received: from osage.ncsa.uiuc.edu (osage.ncsa.uiuc.edu [141.142.2.56]) by mx1.ncsa.uiuc.edu (8.11.0/8.11.0) with ESMTP id eAEHit318585 for ; Tue, 14 Nov 2000 11:44:55 -0600 (CST) Received: from localhost (abailey@localhost) by osage.ncsa.uiuc.edu (8.9.3/8.9.3) with ESMTP id LAA26151 for ; Tue, 14 Nov 2000 11:44:55 -0600 Date: Tue, 14 Nov 2000 11:44:55 -0600 (CST) From: Alan Bailey To: pcp@oss.sgi.com Subject: nfs monitoring Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Content-Length: 295 Lines: 12 What is the suggested way to monitor nfs mounts? I want to make sure that nfs mounted directories are still mounted and up on the remote host. I couldn't find anything by looking at the nfs.* or nfs3.* metrics. Also, sorry for the many emails to this list :) Thanks, Alan -- Alan Bailey From owner-pcp@oss.sgi.com Mon Nov 20 04:20:05 2000 Received: by oss.sgi.com id ; Mon, 20 Nov 2000 04:19:55 -0800 Received: from tah14.ctt.cz ([194.108.115.182]:65033 "EHLO arthur.plbohnice.cz") by oss.sgi.com with ESMTP id ; Mon, 20 Nov 2000 04:19:34 -0800 Received: (from lemming@localhost) by arthur.plbohnice.cz (8.9.3/8.10.1) id NAA28068 for pcp@oss.sgi.com; Mon, 20 Nov 2000 13:18:53 +0100 Date: Mon, 20 Nov 2000 13:18:53 +0100 From: Michal Kara To: pcp@oss.sgi.com Subject: PMLogger question/suggestion Message-ID: <20001120131853.A27987@arthur.plbohnice.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Content-Length: 703 Lines: 26 Hello! I wanted to ask whether there is an "include" functionality in pmlogger configuration. What I have is several computers, with several services (apache, zmailer, mysql,...) running on some of them. It would be nice do define metrics to gather from /apache/zmailer/mysql pmda and then have files for each of the computers like: include 'config.std' include 'config.apache' include 'config.zmailer' when apache & zmailer are running, include 'config.std' include 'config.zmailer' include 'config.mysql' when zmailer & mysql are running. And so on. However, I did not found anything about 'include' in pmlogger man pages. Is it there somewhere? Thanks, Michal From owner-pcp@oss.sgi.com Mon Nov 20 14:38:48 2000 Received: by oss.sgi.com id ; Mon, 20 Nov 2000 14:38:38 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:54582 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Mon, 20 Nov 2000 14:38:12 -0800 Received: from rattle.melbourne.sgi.com (rattle.melbourne.sgi.com [134.14.55.145]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id OAA22523 for ; Mon, 20 Nov 2000 14:30:18 -0800 (PST) mail_from (kenmcd@melbourne.sgi.com) Received: from localhost (kenmcd@localhost) by rattle.melbourne.sgi.com (SGI-8.9.3/8.9.3) with ESMTP id JAA67175; Tue, 21 Nov 2000 09:35:36 +1100 (AEDT) X-Authentication-Warning: rattle.melbourne.sgi.com: kenmcd owned process doing -bs Date: Tue, 21 Nov 2000 09:35:36 +1100 From: Ken McDonell Reply-To: kenmcd@sgi.com To: Michal Kara cc: pcp@oss.sgi.com Subject: Re: PMLogger question/suggestion In-Reply-To: <20001120131853.A27987@arthur.plbohnice.cz> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Content-Length: 1097 Lines: 36 My initial reaction (given my old fart Unix status) is that I'd recommend using cpp to generate your pmlogger config files, rather than adding this functionality to pmlogger (this functionality certainly is not part of pmlogger at the moment). This is based on the "stay small, bloat less" mantra. On Mon, 20 Nov 2000, Michal Kara wrote: > Hello! > > I wanted to ask whether there is an "include" functionality in pmlogger > configuration. What I have is several computers, with several services (apache, > zmailer, mysql,...) running on some of them. It would be nice do define metrics > to gather from /apache/zmailer/mysql pmda and then have files for each > of the computers like: > > include 'config.std' > include 'config.apache' > include 'config.zmailer' > > when apache & zmailer are running, > > include 'config.std' > include 'config.zmailer' > include 'config.mysql' > > when zmailer & mysql are running. And so on. > > However, I did not found anything about 'include' in pmlogger man pages. Is it > there somewhere? > > Thanks, > Michal > From owner-pcp@oss.sgi.com Mon Nov 20 16:42:48 2000 Received: by oss.sgi.com id ; Mon, 20 Nov 2000 16:42:38 -0800 Received: from pneumatic-tube.sgi.com ([204.94.214.22]:8782 "EHLO pneumatic-tube.sgi.com") by oss.sgi.com with ESMTP id ; Mon, 20 Nov 2000 16:42:26 -0800 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by pneumatic-tube.sgi.com (980327.SGI.8.8.8-aspam/980310.SGI-aspam) via SMTP id QAA02403 for ; Mon, 20 Nov 2000 16:50:17 -0800 (PST) mail_from (markgw@sgi.com) Received: from sandpit.melbourne.sgi.com (sandpit.melbourne.sgi.com [134.14.55.132]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA13052; Tue, 21 Nov 2000 11:41:06 +1100 Date: Tue, 21 Nov 2000 11:41:06 +1100 (EST) From: Mark Goodwin X-Sender: markgw@sandpit.melbourne.sgi.com To: Michal Kara cc: pcp@oss.sgi.com Subject: Re: PMLogger question/suggestion In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Content-Length: 972 Lines: 29 On Tue, 21 Nov 2000, Ken McDonell wrote: > My initial reaction (given my old fart Unix status) is that I'd > recommend using cpp to generate your pmlogger config files, rather than > adding this functionality to pmlogger (this functionality certainly is > not part of pmlogger at the moment). I think Michal's suggestion is a good one - it would be handy for those who have to deploy pmlogger all over the place. In the simple case, simply cat all the configs you want together and pipe them straight into pmlogger (it reads it's config on stdin if -c is not given). But there are some cpp features that may be useful, e.g. conditional constructs, macros, etc. So you could use a simple script like this: #! /bin/sh . /etc/pcp.env $PCP_CPP_PROG | pmlogger $* and then run it just like pmlogger with the config on stdin. This is so simple it probably doesn't warrant inclusion in the base pcp src, but please pipe up if you'd like this feature ...! thanks -- Mark From owner-pcp@oss.sgi.com Tue Nov 21 12:33:43 2000 Received: by oss.sgi.com id ; Tue, 21 Nov 2000 12:33:33 -0800 Received: from pneumatic-tube.sgi.com ([204.94.214.22]:30268 "EHLO pneumatic-tube.sgi.com") by oss.sgi.com with ESMTP id ; Tue, 21 Nov 2000 12:33:19 -0800 Received: from rattle.melbourne.sgi.com (rattle.melbourne.sgi.com [134.14.55.145]) by pneumatic-tube.sgi.com (980327.SGI.8.8.8-aspam/980310.SGI-aspam) via ESMTP id MAA03535 for ; Tue, 21 Nov 2000 12:41:11 -0800 (PST) mail_from (kenmcd@melbourne.sgi.com) Received: from localhost (kenmcd@localhost) by rattle.melbourne.sgi.com (SGI-8.9.3/8.9.3) with ESMTP id HAA30594 for ; Wed, 22 Nov 2000 07:32:00 +1100 (AEDT) X-Authentication-Warning: rattle.melbourne.sgi.com: kenmcd owned process doing -bs Date: Wed, 22 Nov 2000 07:32:00 +1100 From: Ken McDonell Reply-To: kenmcd@sgi.com To: pcp@oss.sgi.com Subject: Contributed PCP software Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Content-Length: 763 Lines: 20 To foster PCP development in the wider community, we are establishing a download area on oss.sgi.com for distributions of non-SGI software that enhances or extends PCP. Given the APIs that the open source PCP package exposes, we'd expect this contributed software to include things like: - new collector plugins (PMDAs) - new monitoring tools - clever re-use of existing PCP pieces to solve new performance management tasks - mirroring of PCP packages that are also available elsewhere If you have candidates for inclusion in the PCP contributed software collection, send mail to pcp@oss.sgi.com so others can see what you've got to offer, and someone in the PCP team within SGI will get back to you and organize the logistics. Thanks. From owner-pcp@oss.sgi.com Tue Nov 21 19:10:34 2000 Received: by oss.sgi.com id ; Tue, 21 Nov 2000 19:10:14 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:21046 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Tue, 21 Nov 2000 19:09:45 -0800 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via SMTP id TAA26739 for ; Tue, 21 Nov 2000 19:01:51 -0800 (PST) mail_from (markgw@sgi.com) Received: from sandpit.melbourne.sgi.com (sandpit.melbourne.sgi.com [134.14.55.132]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA22808; Wed, 22 Nov 2000 14:08:22 +1100 Date: Wed, 22 Nov 2000 14:08:22 +1100 (EST) From: Mark Goodwin X-Sender: markgw@sandpit.melbourne.sgi.com To: Michal Kara cc: pcp@oss.sgi.com Subject: Re: PMLogger question/suggestion In-Reply-To: <20001121083613.A32319@arthur.plbohnice.cz> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Content-Length: 2507 Lines: 62 On Tue, 21 Nov 2000, Michal Kara wrote: > > > My initial reaction (given my old fart Unix status) is that I'd > > > recommend using cpp to generate your pmlogger config files, rather than > > > adding this functionality to pmlogger (this functionality certainly is > > > not part of pmlogger at the moment). > > Or m4... It is not a bad idea. I personally find m4 to be a pain, but others don't of course. > > > In the simple case, simply cat all the configs you want together > > and pipe them straight into pmlogger (it reads it's config on stdin > > if -c is not given). But there are some cpp features that may be > > useful, e.g. conditional constructs, macros, etc. > > > > So you could use a simple script like this: > > ... > > I think it would be more complicated if you want to run more pmloggers, you'd > have to do something like: > > file=$1 > shift > cpp $file | pmlogger $* > > I will probably write my solution using makefile, it seems better for me. >From what I can gather, you should use a Makefile to drive cpp in the /var/pcp/config/pmlogger directory. > > BTW, I have two more features I miss in pmlogger: > > Ability to simply restart pmlogger for given machine. Maybe it is already > there(?). Currently I have to go to pmlc, connect to loggers and determine which > connects which machine. But it is just a suggestion. I think all of this pmlogger management stuff you need is already available. Check out the man page for pmlogger_check(1). This is an extensible cron based pcp archive log management infrastructure driven by the control file in /var/pcp/config/pmlogger/control. Suitable/example cron entries are in /var/pcp/config/pmlogger/crontab. > > Second feature is to be able to tell pmlogger to behave more like cron - when > you have metric A gathered every 30 seconds and metric B every 60 seconds. > Currently, it can happen that the metric A is gathered on seconds 10 and 40 and > metric B on second 15 which results in more requests and sometimes the pmda must > do its work twice. The reason I need this is that pmcd on the machines I monitor > currently uses 5-10% of the CPU and I think this feature would lower the load. > That is far too much CPU time for such a low fetch rate. Can you send me your pmlogger configs and/or gprof /var/pcp/pmdas/linux/pmdalinux (this is the daemon form of the linux PMDA dso, so you can profile it. You'll need to edit /var/pcp/config/pmcd/pmcd.conf to change from DSO to daemon). thanks -- Mark From owner-pcp@oss.sgi.com Wed Nov 22 00:38:14 2000 Received: by oss.sgi.com id ; Wed, 22 Nov 2000 00:38:04 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:35701 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Wed, 22 Nov 2000 00:37:54 -0800 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via SMTP id AAA02621 for ; Wed, 22 Nov 2000 00:29:56 -0800 (PST) mail_from (markgw@sgi.com) Received: from sandpit.melbourne.sgi.com (sandpit.melbourne.sgi.com [134.14.55.132]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id TAA24655; Wed, 22 Nov 2000 19:35:13 +1100 Date: Wed, 22 Nov 2000 19:35:13 +1100 (EST) From: Mark Goodwin X-Sender: markgw@sandpit.melbourne.sgi.com To: Michal Kara cc: pcp@oss.sgi.com Subject: Re: PMLogger question/suggestion In-Reply-To: <20001122090114.B5811@arthur.plbohnice.cz> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Content-Length: 2052 Lines: 49 On Wed, 22 Nov 2000, Michal Kara wrote: > > From what I can gather, you should use a Makefile to drive cpp in the > > /var/pcp/config/pmlogger directory. > > I have created such Makefile. It is universal - it takes all .src files and > preprocesses them and it also automagically handles dependencies. I can send it > to you, if you are interested. yes please. There would be no harm in including it in the pcp distro. As with your previous contributions, you need to give it to SGI before I can include it in the distribution. > > > I think all of this pmlogger management stuff you need is already > > available. Check out the man page for pmlogger_check(1). This is an > > extensible cron based pcp archive log management infrastructure driven > > by the control file in /var/pcp/config/pmlogger/control. Suitable/example > > cron entries are in /var/pcp/config/pmlogger/crontab. > > What I wanted to have is command "restart pmlogger for server > host1.domain.com". But it is not too complicated, I realized that the names of > the servers are in commandline of the process, so "ps x | grep pmlogger" and > then kill the process works. so this is a "kill and then restart" situation, right? pmlogger_check will only only start the pmloggers if they have died or not running ... > (I use pmlogger_check and pmlogger_daily and also > my own script to merge all logs for each of the monitored hosts (made when pcp > restarted or pmda went down...)) pmlogger_daily and/or pmlogger_merge should be enough to merge the daily logs .. but knowing you, your own script was necessary ;-) > > > That is far too much CPU time for such a low fetch rate. Can you send me your > > pmlogger configs and/or gprof /var/pcp/pmdas/linux/pmdalinux (this is the > > daemon form of the linux PMDA dso, so you can profile it. You'll need to > > edit /var/pcp/config/pmcd/pmcd.conf to change from DSO to daemon). > > Hmpf, it is much better today (???) Maybe some kind of error... If it appears > again, I will investigate it... > ok thanks -- Mark From owner-pcp@oss.sgi.com Mon Nov 27 17:29:04 2000 Received: by oss.sgi.com id ; Mon, 27 Nov 2000 17:28:54 -0800 Received: from pneumatic-tube.sgi.com ([204.94.214.22]:32372 "EHLO pneumatic-tube.sgi.com") by oss.sgi.com with ESMTP id ; Mon, 27 Nov 2000 17:28:31 -0800 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by pneumatic-tube.sgi.com (980327.SGI.8.8.8-aspam/980310.SGI-aspam) via SMTP id RAA07987 for ; Mon, 27 Nov 2000 17:36:30 -0800 (PST) mail_from (markgw@sgi.com) Received: from sandpit.melbourne.sgi.com (sandpit.melbourne.sgi.com [134.14.55.132]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA00393; Tue, 28 Nov 2000 12:27:12 +1100 Date: Tue, 28 Nov 2000 12:27:11 +1100 (EST) From: Mark Goodwin X-Sender: markgw@sandpit.melbourne.sgi.com To: pcp@oss.sgi.com cc: sgi.engr.pcp@engr.sgi.com, ptg@larry.melbourne.sgi.com Subject: pcp-2.1.11-6 now available Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Content-Length: 2004 Lines: 42 pcp-2.1.11-6 is now available. This is an intermediate bugfix and testing release before the next major release (around March timeframe). As usual we'd like to thank those that have contributed, in particular Michal Kara for contributing the new Apache agent (see the file /var/pcp/pmdas/apache/README for details) and Laurent Demailly for allowing us to re-license his http_lib sources as LGPL. The binary and src RPMs and tarballs are available from (note: dev subdir) ftp://oss.sgi.com/www/projects/pcp/download/dev/ Changes since the last release (pcp-2.1.10-8) include: - don't include linux/kernel_stat.h and avoid __sparc__ conditional code - from Michal Kara: rc will rebuild PMNS if root_* files newer than root - add the roomtemp PMDA for measuring temperatures using the 1-Wire serial network ans sensor technology from Dallas Semiconductor - zero network.tcpconn values before counting them in /proc/net/tcp (Michal Kara's original code was correct - markgw busted it!) - add new LGPL library libpcp_http. Used by permission of the author, Laurent Demailly - minor surgery on apache PMDA to link with -lpcp_http - minor fix diagnostic from __pmLogRead - as reported by Alexander L. Belikoff , it was not possible to disable the primary logger via changes to the /var/pcp/config/pmlogger/control file ... this has been fixed - as reported by Alan Bailey , the assumption that /var/pcp/config/pmlogger/control was version 1.1 was implicit ... this is now documented and the pmlogger_* scripts will warn if the deprecated version 1.0 format is used accidently - from Michal Kara: fix mem leak in apache PMDA - from Michal Kara: install /var/pcp/config/pmlogger/Makefile (src is in src/pmlogctl/Makefile.install). This provides pre-processing of pmlogger config files with cpp. thanks -- Mark Goodwin SGI Engineering From owner-pcp@oss.sgi.com Thu Nov 30 11:35:40 2000 Received: by oss.sgi.com id ; Thu, 30 Nov 2000 11:35:30 -0800 Received: from ex1.ncsa.uiuc.edu ([141.142.2.9]:36236 "EHLO ex1.ncsa.uiuc.edu") by oss.sgi.com with ESMTP id ; Thu, 30 Nov 2000 11:35:09 -0800 Received: from mx1.ncsa.uiuc.edu (mx1.ncsa.uiuc.edu [141.142.2.8]) by ex1.ncsa.uiuc.edu (8.11.0/8.11.0) with ESMTP id eAUJZ8S04110 for ; Thu, 30 Nov 2000 13:35:08 -0600 (CST) X-Envelope-From: abailey@ncsa.uiuc.edu X-Envelope-To: Received: from osage.ncsa.uiuc.edu (osage.ncsa.uiuc.edu [141.142.2.56]) by mx1.ncsa.uiuc.edu (8.11.0/8.11.0) with ESMTP id eAUJZ7B03717 for ; Thu, 30 Nov 2000 13:35:07 -0600 (CST) Received: from localhost (abailey@localhost) by osage.ncsa.uiuc.edu (8.9.3/8.9.3) with ESMTP id NAA08214 for ; Thu, 30 Nov 2000 13:35:07 -0600 Date: Thu, 30 Nov 2000 13:35:07 -0600 (CST) From: Alan Bailey To: pcp@oss.sgi.com Subject: weird error Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Content-Length: 1542 Lines: 68 I don't know if this is a problem with the simple pmda, or with dynamic shared objects on linux, or with something I might have done. But it's weird, that's for sure :) I'm trying to use the simple pmda as a DSO on linux 2.2.14 from pcp-2.1.10. I haven't changed pcp or the simple pmda. Here's output to display the first problem. It seems like when mem.freemem and simple.numfetch are queried at the same time, they both get the value of simple.numfetch. lanner % pminfo -f -h localhost mem.freemem mem.freemem value 1652 lanner % pminfo -f -h localhost simple.numfetch simple.numfetch value 2 lanner % pminfo -f -h localhost simple.numfetch mem.freemem simple.numfetch value 3 mem.freemem value 3 lanner % pminfo -f -h localhost mem.freemem mem.freemem value 1648 Here's some other weirdness. It takes on the value of simple.color, even though that has three instances! Weird... lanner % pminfo -f -h localhost mem.freemem mem.freemem value 2076 lanner % pminfo -f -h localhost simple.color simple.color inst [0 or "red"] value 2 inst [1 or "green"] value 102 inst [2 or "blue"] value 202 lanner % pminfo -f -h localhost simple.color mem.freemem simple.color inst [0 or "red"] value 3 inst [1 or "green"] value 103 inst [2 or "blue"] value 203 mem.freemem value 3 value 103 value 203 That should be enough of a description for a diagnosis. I apologize if this is a simple error or something done wrong on my part. Alan -- Alan Bailey From owner-pcp@oss.sgi.com Thu Nov 30 14:02:31 2000 Received: by oss.sgi.com id ; Thu, 30 Nov 2000 14:02:21 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:54637 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Thu, 30 Nov 2000 14:01:57 -0800 Received: from rattle.melbourne.sgi.com (rattle.melbourne.sgi.com [134.14.55.145]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id OAA16573 for ; Thu, 30 Nov 2000 14:01:56 -0800 (PST) mail_from (kenmcd@melbourne.sgi.com) Received: from localhost (kenmcd@localhost) by rattle.melbourne.sgi.com (SGI-8.9.3/8.9.3) with ESMTP id JAA79842; Fri, 1 Dec 2000 09:00:38 +1100 (AEDT) X-Authentication-Warning: rattle.melbourne.sgi.com: kenmcd owned process doing -bs Date: Fri, 1 Dec 2000 09:00:38 +1100 From: Ken McDonell Reply-To: kenmcd@sgi.com To: Alan Bailey cc: pcp@oss.sgi.com Subject: Re: weird error In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Content-Length: 1319 Lines: 38 On Thu, 30 Nov 2000, Alan Bailey wrote: > I don't know if this is a problem with the simple pmda, or with dynamic > shared objects on linux, or with something I might have done. But it's > weird, that's for sure :) > > ... > > That should be enough of a description for a diagnosis. I apologize if > this is a simple error or something done wrong on my part. This is most unlikely to be your problem ... I'm investigating. Just as an aside, there is a generalized tracing and debugging mechanism in the PCP applications, libraries and daemons. Check out the pmdbg man page, or run pmdbg -l Most commands take a -D option with the argument being a numerical value of the bit-wise or of some debug flags, or a comma separated list of the debug flag names (stripped of the leading DBG_TRACE_) in upper or lower case, e.g. $ pminfo -D profile,pdu -f simple.color You can use pmstore and the metric pmcd.control.debug to turn tracing on/off for pmcd, e.g. $ pmstore pmcd.control.debug 5 turns on PDU and PROFILE tracing for pmcd, and the diagnostics will be written to /var/log/pcp/pmcd/pmcd.log In a case like the one above, seeing the Protocol Data Units (PDUs) exchanged between pminfo and pmcd would be our first port of call in debugging the problem, so adding this info to bug reports would be helpful. From owner-pcp@oss.sgi.com Thu Nov 30 14:20:01 2000 Received: by oss.sgi.com id ; Thu, 30 Nov 2000 14:19:42 -0800 Received: from ex1.ncsa.uiuc.edu ([141.142.2.9]:24483 "EHLO ex1.ncsa.uiuc.edu") by oss.sgi.com with ESMTP id ; Thu, 30 Nov 2000 14:19:13 -0800 Received: from mx1.ncsa.uiuc.edu (mx1.ncsa.uiuc.edu [141.142.2.8]) by ex1.ncsa.uiuc.edu (8.11.0/8.11.0) with ESMTP id eAUMJBS16660; Thu, 30 Nov 2000 16:19:11 -0600 (CST) X-Envelope-From: abailey@ncsa.uiuc.edu Received: from osage.ncsa.uiuc.edu (osage.ncsa.uiuc.edu [141.142.2.56]) by mx1.ncsa.uiuc.edu (8.11.0/8.11.0) with ESMTP id eAUMJAB12609; Thu, 30 Nov 2000 16:19:10 -0600 (CST) Received: from localhost (abailey@localhost) by osage.ncsa.uiuc.edu (8.9.3/8.9.3) with ESMTP id QAA09705; Thu, 30 Nov 2000 16:19:10 -0600 Date: Thu, 30 Nov 2000 16:19:10 -0600 (CST) From: Alan Bailey To: kenmcd@sgi.com cc: pcp@oss.sgi.com Subject: Re: weird error In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Content-Length: 7344 Lines: 191 Thanks for the debugging information, it's helpful. I forgot to mention in the first email that the simple pmda works just fine as a daemon. Without further ado, here are three pminfo commands with the -D profile,pdu options. The first is just for simple.numfetch, the second is for simple.color, and the third is for simple.color and mem.freemem: ------ lanner % pminfo -f -D profile,pdu simple.numfetch [3068]pmGetPDU: ERROR fd=3 len=20 from=1685 moreinput? no 000: 14 7000 695 0 102 [3068]pmXmitPDU: CREDS fd=3 len=20 000: 14 700c bfc 1000000 201 [3068]pmXmitPDU: PMNS_TRAVERSE fd=3 len=36 000: 24 7010 bfc 0 f000000 706d6973 6e2e656c 65666d75 008: 686374 [3068]pmGetPDU: PMNS_NAMES fd=3 len=44 from=1685 moreinput? no 000: 2c 700e 695 10000000 0 1000000 f000000 706d6973 008: 6e2e656c 65666d75 7e686374 [3068]pmXmitPDU: PMNS_NAMES fd=3 len=44 000: 2c 700e bfc 10000000 0 1000000 f000000 706d6973 008: 6e2e656c 65666d75 7e686374 [3068]pmGetPDU: PMNS_IDS fd=3 len=24 from=1685 moreinput? no 000: 18 700d 695 1000000 1000000 403f pmFetch: calling __pmSendProfile, context: 0 Dump Instance Profile state=INCLUDE, 0 profiles [3068]pmXmitPDU: PROFILE fd=3 len=28 000: 1c 7002 bfc 0 0 0 0 [3068]pmXmitPDU: FETCH fd=3 len=32 000: 20 7003 bfc 0 0 0 1000000 403f [3068]pmGetPDU: RESULT fd=3 len=44 from=1685 moreinput? no 000: 2c 7001 695 e0d0263a 93720a00 1000000 403f 1000000 008: 0 ffffffff 6000000 pmResult dump from 0x804e168 timestamp: 975622368.684691 16:12:48.684 numpmid: 1 253.0.0 (simple.numfetch): numval: 1 valfmt: 0 vlist[]: value 6 [3068]pmXmitPDU: DESC_REQ fd=3 len=16 000: 10 7004 bfc 403f [3068]pmGetPDU: DESC fd=3 len=32 from=1685 moreinput? no 000: 20 7005 695 403f 1000000 ffffffff 3000000 0 simple.numfetch value 6 ------- lanner % pminfo -f -D profile,pdu simple.color [3070]pmGetPDU: ERROR fd=3 len=20 from=1685 moreinput? no 000: 14 7000 695 0 102 [3070]pmXmitPDU: CREDS fd=3 len=20 000: 14 700c bfe 1000000 201 [3070]pmXmitPDU: PMNS_TRAVERSE fd=3 len=32 000: 20 7010 bfe 0 c000000 706d6973 632e656c 726f6c6f [3070]pmGetPDU: PMNS_NAMES fd=3 len=40 from=1685 moreinput? no 000: 28 700e 695 d000000 0 1000000 c000000 706d6973 008: 632e656c 726f6c6f [3070]pmXmitPDU: PMNS_NAMES fd=3 len=40 000: 28 700e bfe d000000 0 1000000 c000000 706d6973 008: 632e656c 726f6c6f [3070]pmGetPDU: PMNS_IDS fd=3 len=24 from=1685 moreinput? no 000: 18 700d 695 1000000 1000000 100403f pmFetch: calling __pmSendProfile, context: 0 Dump Instance Profile state=INCLUDE, 0 profiles [3070]pmXmitPDU: PROFILE fd=3 len=28 000: 1c 7002 bfe 0 0 0 0 [3070]pmXmitPDU: FETCH fd=3 len=32 000: 20 7003 bfe 0 0 0 1000000 100403f [3070]pmGetPDU: RESULT fd=3 len=60 from=1685 moreinput? no 000: 3c 7001 695 24d1263a cf6b0800 1000000 100403f 3000000 008: 0 0 6000000 1000000 6a000000 2000000 ce000000 pmResult dump from 0x804e168 timestamp: 975622436.551887 16:13:56.551 numpmid: 1 253.0.1 (simple.color): numval: 3 valfmt: 0 vlist[]: inst [0 or "red"] value 6 inst [1 or "green"] value 106 inst [2 or "blue"] value 206 [3070]pmXmitPDU: DESC_REQ fd=3 len=16 000: 10 7004 bfe 100403f [3070]pmGetPDU: DESC fd=3 len=32 from=1685 moreinput? no 000: 20 7005 695 100403f 0 403f 3000000 0 simple.color [3070]pmXmitPDU: INSTANCE_REQ fd=3 len=32 000: 20 7006 bfe 403f 0 0 ffffffff 0 [3070]pmGetPDU: INSTANCE fd=3 len=60 from=1685 moreinput? no 000: 3c 7007 695 403f 3000000 0 3000000 7e646572 008: 1000000 5000000 65657267 7e7e7e6e 2000000 4000000 65756c62 inst [0 or "red"] value 6 inst [1 or "green"] value 106 inst [2 or "blue"] value 206 ----------- lanner % pminfo -f -D profile,pdu simple.color mem.freemem [3305]pmGetPDU: ERROR fd=3 len=20 from=3285 moreinput? no 000: 14 7000 cd5 0 102 [3305]pmXmitPDU: CREDS fd=3 len=20 000: 14 700c ce9 1000000 201 [3305]pmXmitPDU: PMNS_TRAVERSE fd=3 len=32 000: 20 7010 ce9 0 c000000 706d6973 632e656c 726f6c6f [3305]pmGetPDU: PMNS_NAMES fd=3 len=40 from=3285 moreinput? no 000: 28 700e cd5 d000000 0 1000000 c000000 706d6973 008: 632e656c 726f6c6f [3305]pmXmitPDU: PMNS_TRAVERSE fd=3 len=32 000: 20 7010 ce9 0 b000000 2e6d656d 65657266 706d656d [3305]pmGetPDU: PMNS_NAMES fd=3 len=40 from=3285 moreinput? no 000: 28 700e cd5 c000000 0 1000000 b000000 2e6d656d 008: 65657266 7e6d656d [3305]pmXmitPDU: PMNS_NAMES fd=3 len=56 000: 38 700e ce9 19000000 0 2000000 c000000 706d6973 008: 632e656c 726f6c6f b000000 2e6d656d 65657266 7e6d656d [3305]pmGetPDU: PMNS_IDS fd=3 len=28 from=3285 moreinput? no 000: 1c 700d cd5 2000000 2000000 100403f a04000f pmFetch: calling __pmSendProfile, context: 0 Dump Instance Profile state=INCLUDE, 0 profiles [3305]pmXmitPDU: PROFILE fd=3 len=28 000: 1c 7002 ce9 0 0 0 0 [3305]pmXmitPDU: FETCH fd=3 len=36 000: 24 7003 ce9 0 0 0 2000000 100403f 008: a04000f [3305]pmGetPDU: RESULT fd=3 len=96 from=3285 moreinput? no 000: 60 7001 cd5 e2d1263a caea0c00 2000000 100403f 3000000 008: 0 0 4000000 1000000 68000000 2000000 cc000000 100403f 016: 3000000 0 0 4000000 1000000 68000000 2000000 cc000000 pmResult dump from 0x804e168 timestamp: 975622626.846538 16:17:06.846 numpmid: 2 253.0.1 (simple.color): numval: 3 valfmt: 0 vlist[]: inst [0 or "red"] value 4 inst [1 or "green"] value 104 inst [2 or "blue"] value 204 253.0.1 (simple.color): numval: 3 valfmt: 0 vlist[]: inst [0 or "red"] value 4 inst [1 or "green"] value 104 inst [2 or "blue"] value 204 [3305]pmXmitPDU: DESC_REQ fd=3 len=16 000: 10 7004 ce9 100403f [3305]pmGetPDU: DESC fd=3 len=32 from=3285 moreinput? no 000: 20 7005 cd5 100403f 0 403f 3000000 0 simple.color [3305]pmXmitPDU: INSTANCE_REQ fd=3 len=32 000: 20 7006 ce9 403f 0 0 ffffffff 0 [3305]pmGetPDU: INSTANCE fd=3 len=60 from=3285 moreinput? no 000: 3c 7007 cd5 403f 3000000 0 3000000 7e646572 008: 1000000 5000000 65657267 7e7e7e6e 2000000 4000000 65756c62 inst [0 or "red"] value 4 inst [1 or "green"] value 104 inst [2 or "blue"] value 204 [3305]pmXmitPDU: DESC_REQ fd=3 len=16 000: 10 7004 ce9 a04000f [3305]pmGetPDU: DESC fd=3 len=32 from=3285 moreinput? no 000: 20 7005 cd5 a04000f 1000000 ffffffff 3000000 110 mem.freemem value 4 value 104 value 204 ---------- Thanks, Alan -- Alan Bailey From owner-pcp@oss.sgi.com Thu Nov 30 14:36:51 2000 Received: by oss.sgi.com id ; Thu, 30 Nov 2000 14:36:41 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:5382 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Thu, 30 Nov 2000 14:36:23 -0800 Received: from rattle.melbourne.sgi.com (rattle.melbourne.sgi.com [134.14.55.145]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id OAA26315 for ; Thu, 30 Nov 2000 14:36:21 -0800 (PST) mail_from (kenmcd@melbourne.sgi.com) Received: from localhost (kenmcd@localhost) by rattle.melbourne.sgi.com (SGI-8.9.3/8.9.3) with ESMTP id JAA86225; Fri, 1 Dec 2000 09:33:49 +1100 (AEDT) X-Authentication-Warning: rattle.melbourne.sgi.com: kenmcd owned process doing -bs Date: Fri, 1 Dec 2000 09:33:48 +1100 From: Ken McDonell Reply-To: kenmcd@sgi.com To: Alan Bailey cc: pcp@oss.sgi.com Subject: Re: weird error In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Content-Length: 2069 Lines: 60 OK, you are faster than I am! The problem is clearly reproducible ... and the diagnosis is below On Thu, 30 Nov 2000, Alan Bailey wrote: > Thanks for the debugging information, it's helpful. I forgot to mention > in the first email that the simple pmda works just fine as a daemon. Yes, I've discovered this too and there's a strong hint there. > Without further ado, here are three pminfo commands with the -D > profile,pdu options. The first is just for simple.numfetch, the second is > for simple.color, and the third is for simple.color and mem.freemem: > > ... [OK stuff deleted] > lanner % pminfo -f -D profile,pdu simple.color mem.freemem > ... > [3305]pmXmitPDU: FETCH fd=3 len=36 > 000: 24 7003 ce9 0 0 0 2000000 100403f > 008: a04000f Note 0x100403f is the PMID for simple.color and 0xa04000f is the PMID for mem.freemem being sent from pminfo to pmcd. > [3305]pmGetPDU: RESULT fd=3 len=96 from=3285 moreinput? no > 000: 60 7001 cd5 e2d1263a caea0c00 2000000 100403f 3000000 > 008: 0 0 4000000 1000000 68000000 2000000 cc000000 100403f > 016: 3000000 0 0 4000000 1000000 68000000 2000000 cc000000 > pmResult dump from 0x804e168 timestamp: 975622626.846538 16:17:06.846 > numpmid: 2 253.0.1 (simple.color): numval: 3 valfmt: 0 vlist[]: > inst [0 or "red"] value 4 > inst [1 or "green"] value 104 > inst [2 or "blue"] value 204 > 253.0.1 (simple.color): numval: 3 valfmt: 0 vlist[]: > inst [0 or "red"] value 4 > inst [1 or "green"] value 104 > inst [2 or "blue"] value 204 When the answer comes back, PMID 0xa04000f has vanished and 0x100403f appears twice ... this is BOGUS! > ... > > mem.freemem > value 4 > value 104 > value 204 pminfo uses the name of the second metric (mem.freemem) on the reasonable assumption that the PMID should match. The problem is that mem.* and simple.* are in two different DSO agents, which is why making simple a daemon makes the problem go away. Expect a fix real soon. From owner-pcp@oss.sgi.com Thu Nov 30 15:37:52 2000 Received: by oss.sgi.com id ; Thu, 30 Nov 2000 15:37:42 -0800 Received: from ex1.ncsa.uiuc.edu ([141.142.2.9]:64682 "EHLO ex1.ncsa.uiuc.edu") by oss.sgi.com with ESMTP id ; Thu, 30 Nov 2000 15:37:28 -0800 Received: from mx1.ncsa.uiuc.edu (mx1.ncsa.uiuc.edu [141.142.2.8]) by ex1.ncsa.uiuc.edu (8.11.0/8.11.0) with ESMTP id eAUNbFS21817; Thu, 30 Nov 2000 17:37:16 -0600 (CST) X-Envelope-From: abailey@ncsa.uiuc.edu Received: from osage.ncsa.uiuc.edu (osage.ncsa.uiuc.edu [141.142.2.56]) by mx1.ncsa.uiuc.edu (8.11.0/8.11.0) with ESMTP id eAUNbFB26529; Thu, 30 Nov 2000 17:37:15 -0600 (CST) Received: from localhost (abailey@localhost) by osage.ncsa.uiuc.edu (8.9.3/8.9.3) with ESMTP id RAA10443; Thu, 30 Nov 2000 17:37:15 -0600 Date: Thu, 30 Nov 2000 17:37:15 -0600 (CST) From: Alan Bailey To: kenmcd@sgi.com cc: pcp@oss.sgi.com Subject: Re: weird error In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Content-Length: 2450 Lines: 75 Awesome! (Well, not that there is a problem, but that you found it quickly). I'll be anxiously awaiting your update :) I really appreciate all the work... Thanks a bunch for open source PCP. Alan On Fri, 1 Dec 2000, Ken McDonell wrote: > OK, you are faster than I am! > > The problem is clearly reproducible ... and the diagnosis is below > > On Thu, 30 Nov 2000, Alan Bailey wrote: > > > Thanks for the debugging information, it's helpful. I forgot to mention > > in the first email that the simple pmda works just fine as a daemon. > > Yes, I've discovered this too and there's a strong hint there. > > > Without further ado, here are three pminfo commands with the -D > > profile,pdu options. The first is just for simple.numfetch, the second is > > for simple.color, and the third is for simple.color and mem.freemem: > > > > ... > > [OK stuff deleted] > > > lanner % pminfo -f -D profile,pdu simple.color mem.freemem > > ... > > [3305]pmXmitPDU: FETCH fd=3 len=36 > > 000: 24 7003 ce9 0 0 0 2000000 100403f > > 008: a04000f > > Note 0x100403f is the PMID for simple.color and 0xa04000f is the PMID for > mem.freemem being sent from pminfo to pmcd. > > > [3305]pmGetPDU: RESULT fd=3 len=96 from=3285 moreinput? no > > 000: 60 7001 cd5 e2d1263a caea0c00 2000000 100403f 3000000 > > 008: 0 0 4000000 1000000 68000000 2000000 cc000000 100403f > > 016: 3000000 0 0 4000000 1000000 68000000 2000000 cc000000 > > pmResult dump from 0x804e168 timestamp: 975622626.846538 16:17:06.846 > > numpmid: 2 253.0.1 (simple.color): numval: 3 valfmt: 0 vlist[]: > > inst [0 or "red"] value 4 > > inst [1 or "green"] value 104 > > inst [2 or "blue"] value 204 > > 253.0.1 (simple.color): numval: 3 valfmt: 0 vlist[]: > > inst [0 or "red"] value 4 > > inst [1 or "green"] value 104 > > inst [2 or "blue"] value 204 > > When the answer comes back, PMID 0xa04000f has vanished and 0x100403f > appears twice ... this is BOGUS! > > > ... > > > > mem.freemem > > value 4 > > value 104 > > value 204 > > pminfo uses the name of the second metric (mem.freemem) on the reasonable > assumption that the PMID should match. > > The problem is that mem.* and simple.* are in two different DSO agents, > which is why making simple a daemon makes the problem go away. > > Expect a fix real soon. > -- Alan Bailey From owner-pcp@oss.sgi.com Thu Nov 30 21:19:24 2000 Received: by oss.sgi.com id ; Thu, 30 Nov 2000 21:19:04 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:55821 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Thu, 30 Nov 2000 21:18:35 -0800 Received: from rattle.melbourne.sgi.com (rattle.melbourne.sgi.com [134.14.55.145]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id VAA09348 for ; Thu, 30 Nov 2000 21:18:34 -0800 (PST) mail_from (kenmcd@melbourne.sgi.com) Received: from localhost (kenmcd@localhost) by rattle.melbourne.sgi.com (SGI-8.9.3/8.9.3) with ESMTP id QAA04689; Fri, 1 Dec 2000 16:17:16 +1100 (AEDT) X-Authentication-Warning: rattle.melbourne.sgi.com: kenmcd owned process doing -bs Date: Fri, 1 Dec 2000 16:17:15 +1100 From: Ken McDonell Reply-To: kenmcd@sgi.com To: Alan Bailey cc: pcp@oss.sgi.com Subject: Re: weird error In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="-2045888623-808250409-975647835=:2281230" Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Content-Length: 6127 Lines: 112 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. ---2045888623-808250409-975647835=:2281230 Content-Type: TEXT/PLAIN; charset=US-ASCII On Thu, 30 Nov 2000, Alan Bailey wrote: > Awesome! (Well, not that there is a problem, but that you found it > quickly). > > I'll be anxiously awaiting your update :) Attached is a patch for three source files in libpcp_pmda ... I believe this will fix the problem. Let us know. This will be in the next spin of the dev rpms. ---2045888623-808250409-975647835=:2281230 Content-Type: TEXT/PLAIN; charset=US-ASCII; name=patch Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: libpcp_pmda patches Content-Disposition: attachment; filename=patch LS0tIC92YXIvdG1wL3BfcmRpZmZfYTAyanRRL2NhbGxiYWNrLmMJRnJpIERl YyAgMSAxNjoxMTo0MCAyMDAwDQorKysgc3JjL2xpYnBjcF9wbWRhL3NyYy9j YWxsYmFjay5jCUZyaSBEZWMgIDEgMTI6NTc6MDQgMjAwMA0KQEAgLTI3Myw5 ICsyNzMsNiBAQA0KIGludA0KIHBtZGFGZXRjaChpbnQgbnVtcG1pZCwgcG1J RCBwbWlkbGlzdFtdLCBwbVJlc3VsdCAqKnJlc3AsIHBtZGFFeHQgKnBtZGEp DQogew0KLSAgICBzdGF0aWMgaW50CQltYXhucG1pZHMgPSAwOw0KLSAgICBz dGF0aWMgcG1SZXN1bHQJKnJlcyA9IE5VTEw7DQotDQogICAgIGludAkJCWk7 CQkvKiBvdmVyIHBtaWRsaXN0W10gKi8NCiAgICAgaW50CQkJajsJCS8qIG92 ZXIgbWV0YXRhYiBhbmQgdnNldC0+dmxpc3RbXSAqLw0KICAgICBpbnQJCQlz dHM7DQpAQCAtMjg4LDIwICsyODUsMjEgQEANCiAgICAgcG1kYU1ldHJpYwkJ Km1ldGFwOw0KICAgICBwbUF0b21WYWx1ZQkJYXRvbTsNCiAgICAgaW50CQkJ dHlwZTsNCisgICAgZV9leHRfdAkJKmV4dHAgPSAoZV9leHRfdCAqKXBtZGEt PmVfZXh0Ow0KIA0KLSAgICBpZiAobnVtcG1pZCA+IG1heG5wbWlkcykgew0K LQlpZiAocmVzICE9IE5VTEwpDQotCSAgICBmcmVlKHJlcyk7DQorICAgIGlm IChudW1wbWlkID4gZXh0cC0+bWF4bnBtaWRzKSB7DQorCWlmIChleHRwLT5y ZXMgIT0gTlVMTCkNCisJICAgIGZyZWUoZXh0cC0+cmVzKTsNCiAJLyogKG51 bXBtaWQgLSAxKSBiZWNhdXNlIHRoZXJlJ3Mgcm9vbSBmb3Igb25lIHZhbHVl U2V0IGluIGEgcG1SZXN1bHQgKi8NCiAJbmVlZCA9IChpbnQpc2l6ZW9mKHBt UmVzdWx0KSArIChudW1wbWlkIC0gMSkgKiAoaW50KXNpemVvZihwbVZhbHVl U2V0ICopOw0KLQlpZiAoKHJlcyA9IChwbVJlc3VsdCAqKSBtYWxsb2MobmVl ZCkpID09IE5VTEwpDQorCWlmICgoZXh0cC0+cmVzID0gKHBtUmVzdWx0ICop IG1hbGxvYyhuZWVkKSkgPT0gTlVMTCkNCiAJICAgIHJldHVybiAtZXJybm87 DQotCW1heG5wbWlkcyA9IG51bXBtaWQ7DQorCWV4dHAtPm1heG5wbWlkcyA9 IG51bXBtaWQ7DQogICAgIH0NCiANCi0gICAgcmVzLT50aW1lc3RhbXAudHZf c2VjID0gMDsNCi0gICAgcmVzLT50aW1lc3RhbXAudHZfdXNlYyA9IDA7DQot ICAgIHJlcy0+bnVtcG1pZCA9IG51bXBtaWQ7DQorICAgIGV4dHAtPnJlcy0+ dGltZXN0YW1wLnR2X3NlYyA9IDA7DQorICAgIGV4dHAtPnJlcy0+dGltZXN0 YW1wLnR2X3VzZWMgPSAwOw0KKyAgICBleHRwLT5yZXMtPm51bXBtaWQgPSBu dW1wbWlkOw0KIA0KICAgICBmb3IgKGkgPSAwOyBpIDwgbnVtcG1pZDsgaSsr KSB7DQogDQpAQCAtMzYwLDEzICszNTgsMTMgQEANCiANCiAJLyogTXVzdCB1 c2UgaW5kaXZpZHVhbCBtYWxsb2MoKXMgYmVjYXVzZSBvZiBwbUZyZWVSZXN1 bHQoKSAqLw0KIAlpZiAobnVtdmFsID09IDEpDQotCSAgICByZXMtPnZzZXRb aV0gPSB2c2V0ID0gKHBtVmFsdWVTZXQgKikNCisJICAgIGV4dHAtPnJlcy0+ dnNldFtpXSA9IHZzZXQgPSAocG1WYWx1ZVNldCAqKQ0KIAkgICAgCQkJCV9f cG1Qb29sQWxsb2Moc2l6ZW9mKHBtVmFsdWVTZXQpKTsNCiAJZWxzZSBpZiAo bnVtdmFsID4gMSkNCi0JICAgIHJlcy0+dnNldFtpXSA9IHZzZXQgPSAocG1W YWx1ZVNldCAqKW1hbGxvYyhzaXplb2YocG1WYWx1ZVNldCkgKw0KKwkgICAg ZXh0cC0+cmVzLT52c2V0W2ldID0gdnNldCA9IChwbVZhbHVlU2V0ICopbWFs bG9jKHNpemVvZihwbVZhbHVlU2V0KSArDQogCQkJCQkgICAgKG51bXZhbCAt IDEpKnNpemVvZihwbVZhbHVlKSk7DQogCWVsc2UNCi0JICAgIHJlcy0+dnNl dFtpXSA9IHZzZXQgPSAocG1WYWx1ZVNldCAqKW1hbGxvYyhzaXplb2YocG1W YWx1ZVNldCkgLQ0KKwkgICAgZXh0cC0+cmVzLT52c2V0W2ldID0gdnNldCA9 IChwbVZhbHVlU2V0ICopbWFsbG9jKHNpemVvZihwbVZhbHVlU2V0KSAtDQog CQkJCQkgICAgc2l6ZW9mKHBtVmFsdWUpKTsNCiAJaWYgKHZzZXQgPT0gTlVM TCkgew0KIAkgICAgc3RzID0gLWVycm5vOw0KQEAgLTM5MCw3ICszODgsNyBA QA0KIAkgICAgaWYgKGogPT0gbnVtdmFsKSB7DQogCQkvKiBtb3JlIGluc3Rh bmNlcyB0aGFuIGV4cGVjdGVkISAqLw0KIAkJbnVtdmFsKys7DQotCQlyZXMt PnZzZXRbaV0gPSB2c2V0ID0gKHBtVmFsdWVTZXQgKilyZWFsbG9jKHZzZXQs DQorCQlleHRwLT5yZXMtPnZzZXRbaV0gPSB2c2V0ID0gKHBtVmFsdWVTZXQg KilyZWFsbG9jKHZzZXQsDQogCQkJICAgIHNpemVvZihwbVZhbHVlU2V0KSAr IChudW12YWwgLSAxKSpzaXplb2YocG1WYWx1ZSkpOw0KIAkJaWYgKHZzZXQg PT0gTlVMTCkgew0KIAkJICAgIHN0cyA9IC1lcnJubzsNCkBAIC00MjcsOCAr NDI1LDYgQEANCiAJCSAqCT09IDAgPT4gbm8gdmFsdWVzDQogCQkgKgk+IDAg ID0+IE9LDQogCQkgKi8NCi0JCWVfZXh0X3QgKmV4dHAgPSAoZV9leHRfdCAq KXBtZGEtPmVfZXh0Ow0KLQ0KIAkJaWYgKGV4dHAtPnBtZGFfaW50ZXJmYWNl ID09IFBNREFfSU5URVJGQUNFXzIgfHwNCiAJCSAgICAoZXh0cC0+cG1kYV9p bnRlcmZhY2UgPT0gUE1EQV9JTlRFUkZBQ0VfMyAmJiBzdHMgPiAwKSkgew0K IA0KQEAgLTQ1MiwxNCArNDQ4LDE0IEBADQogCSAgICB2c2V0LT5udW12YWwg PSBqOw0KIA0KICAgICB9DQotICAgICpyZXNwID0gcmVzOw0KKyAgICAqcmVz cCA9IGV4dHAtPnJlczsNCiAgICAgcmV0dXJuIDA7DQogDQogIGVycm9yOg0K IA0KICAgICBpZiAoaSkgew0KLQlyZXMtPm51bXBtaWQgPSBpOw0KLQlfX3Bt RnJlZVJlc3VsdFZhbHVlcyhyZXMpOw0KKwlleHRwLT5yZXMtPm51bXBtaWQg PSBpOw0KKwlfX3BtRnJlZVJlc3VsdFZhbHVlcyhleHRwLT5yZXMpOw0KICAg ICB9DQogICAgIHJldHVybiBzdHM7DQogfQ0KLS0tIC92YXIvdG1wL3BfcmRp ZmZfYTAya1VzL2xpYmRlZnMuaAlGcmkgRGVjICAxIDE2OjEyOjA1IDIwMDAN CisrKyBzcmMvbGlicGNwX3BtZGEvc3JjL2xpYmRlZnMuaAlGcmkgRGVjICAx IDEyOjUwOjM1IDIwMDANCkBAIC00MCwxMCArNDAsMTMgQEANCiANCiAvKg0K ICAqIEF1eGlsbGlhcnkgc3RydWN0dXJlIHVzZWQgdG8gc2F2ZSBkYXRhIGZy b20gcG1kYURTTyBvciBwbWRhRGFlbW9uIGFuZA0KLSAqIG1ha2UgaXQgYXZh aWxhYmxlIHRvIHRoZSBvdGhlciBtZXRob2RzLg0KKyAqIG1ha2UgaXQgYXZh aWxhYmxlIHRvIHRoZSBvdGhlciBtZXRob2RzLCBhbHNvIGFzIHByaXZhdGUg cGVyIFBNREEgZGF0YQ0KKyAqIHdoZW4gbXVsdGlwbGUgRFNPIFBNREFzIGFy ZSBpbiB1c2UNCiAgKi8NCiB0eXBlZGVmIHN0cnVjdCB7DQogICAgIGludAkJ cG1kYV9pbnRlcmZhY2U7DQorICAgIHBtUmVzdWx0CSpyZXM7CQkJLyogaGln aC13YXRlciBhbGxvY2F0aW9uIGZvciAqLw0KKyAgICBpbnQJCW1heG5wbWlk czsJCS8qIHBtUmVzdWx0IGZvciBlYWNoIFBNREEgKi8NCiB9IGVfZXh0X3Q7 DQogDQogI2VuZGlmIC8qIExJQkRFRlNfSCAqLw0KLS0tIC92YXIvdG1wL3Bf cmRpZmZfYTAya0hPL29wZW4uYwlGcmkgRGVjICAxIDE2OjEyOjI0IDIwMDAN CisrKyBzcmMvbGlicGNwX3BtZGEvc3JjL29wZW4uYwlGcmkgRGVjICAxIDEy OjU3OjUzIDIwMDANCkBAIC02NjAsNiArNjYwLDggQEANCiAJcmV0dXJuOw0K ICAgICB9DQogICAgIGV4dHAtPnBtZGFfaW50ZXJmYWNlID0gaW50ZXJmYWNl Ow0KKyAgICBleHRwLT5yZXMgPSBOVUxMOw0KKyAgICBleHRwLT5tYXhucG1p ZHMgPSAwOw0KICAgICBwbWRhLT5lX2V4dCA9ICh2b2lkICopZXh0cDsNCiAN CiAgICAgcG1kYVNldFJlc3VsdENhbGxCYWNrKGRpc3BhdGNoLCBfX3BtRnJl ZVJlc3VsdFZhbHVlcyk7DQo= ---2045888623-808250409-975647835=:2281230--