From spamChallenges@i4a.com Thu May 10 13:15:17 2007 Received: with ECARTIS (v1.0.0; list pcp); Thu, 10 May 2007 13:15:22 -0700 (PDT) Received: from mail.romepc.com (mail.romepc.com [66.77.31.96]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l4AKFGfB028581 for ; Thu, 10 May 2007 13:15:17 -0700 Received: from [10.0.42.176] by mail.romepc.com (NTMail 5.06.0016/QS1254.00.fa2886af) with ESMTP id dodbheba for pcp@oss.sgi.com; Thu, 10 May 2007 13:44:11 -0500 Thread-Topic: mail.scr attachment blocked, Original Subject: Mail System Error - Returned Mail thread-index: AceTM1mcquPAkh0aQNG3DCutHBMhVQ== Reply-To: From: To: Cc: Subject: mail.scr attachment blocked, Original Subject: Mail System Error - Returned Mail Date: Thu, 10 May 2007 13:45:15 -0500 Message-ID: <79066301c79333$599c2910$b02a000a@i4a.com> MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Microsoft CDO for Windows 2000 Content-Class: urn:content-classes:message Importance: normal Priority: normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.2826 X-archive-position: 1247 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: spamChallenges@i4a.com Precedence: bulk X-list: pcp Notification regarding email from "pcp@oss.sgi.com" to ssp@lists.sspnet.org. 5/10/2007 1:45:15 PM CST The following email has been blocked because the attachment type may contain a virus: Attachment filename: mail.scr **** Original Text Message **** Subject: Mail System Error - Returned Mail The original message was received at Thu, 10 May 2007 20:40:21 -0700 from [143.128.250.20] ----- The following addresses had permanent fatal errors ----- ssp@lists.sspnet.org From kimbrr@sgi.com Fri May 11 00:30:20 2007 Received: with ECARTIS (v1.0.0; list pcp); Fri, 11 May 2007 00:30:25 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l4B7UGfB010479 for ; Fri, 11 May 2007 00:30:18 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA13330; Fri, 11 May 2007 17:11:12 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l4B7BAAf85552920; Fri, 11 May 2007 17:11:11 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l4B7B5Em89458047; Fri, 11 May 2007 17:11:06 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Fri, 11 May 2007 17:11:05 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: pcp@oss.sgi.com, pcp-announce@sgi.com, pcp-dev@sgi.com, linux-announce@sws1.ornl.gov, nashif@suse.de Subject: [ANNOUNCE] SGI Performance Co-Pilot 2.7.1-1 now available Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1248 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp SGI is pleased to announce the new version of Performance Co-Pilot (PCP) open source (version 2.7.1-1) is now available for download from : ftp://oss.sgi.com/projects/pcp/download This is a major release containing bug fixes, code cleanups, new metrics and new platform support (FreeBSD). A list of changes since the last release (which was version 2.5.0-2) is in /usr/share/doc/packages/pcp-2.7.1/CHANGELOG after installation, or at http://oss.sgi.com/projects/pcp/latest.html There are re-built RPMs for i586 (gcc 4.1.x / glibc 2.4.x), x86_64 and ia64 in the above ftp directory. Other Linux platforms will need to build binary RPMs from the SRPM, e.g. : # rpmbuild --rebuild pcp-2.7.1-1.src.rpm or from the tarball, e.g. : # tar xvzf pcp-2.7.1-1.src.tar.gz # cd pcp-2.7.1 # ./Makepkgs Non-linux platforms need to build the source and then manually install, e.g. : # tar xvzf pcp-2.7.1-1.src.tar.gz # cd pcp-2.7.1 # make # make install About Performance Co-Pilot (PCP) PCP is an extensible system monitoring package with a client/server architecture. It provides a distributed unifying abstraction for all interesting performance statistics in /proc and assorted applications (e.g. Apache). The PCP library APIs are robust and well documented, supporting rapid deployment of new and diverse sources of performance data and the development of sophisticated performance monitoring tools. The PCP homepage is at http://oss.sgi.com/projects/pcp and you can join the PCP mailing list via http://oss.sgi.com/projects/pcp/mail.html SGI would like to thank those who contributed to this and earlier releases. Thanks Dr.Michael("Kimba")Newton kimbrr@sgi.com SGI Engineering From nobody@capricornus.hosting4u.net Sat May 12 09:00:08 2007 Received: with ECARTIS (v1.0.0; list pcp); Sat, 12 May 2007 09:00:16 -0700 (PDT) Received: from fallback-mx3.atl.registeredsite.com (fallback-mx3.atl.registeredsite.com [64.224.219.97]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l4CG06fB015172 for ; Sat, 12 May 2007 09:00:07 -0700 Received: from mail8.atl.registeredsite.com (mail8.atl.registeredsite.com [64.224.219.82]) by fallback-mx3.atl.registeredsite.com (8.12.11.20060308/8.12.11) with ESMTP id l4CFLVP0021671 for ; Sat, 12 May 2007 11:21:31 -0400 Received: from capricornus.hosting4u.net (capricornus.hosting4u.net [209.35.191.104]) by mail8.atl.registeredsite.com (8.12.11.20060308/8.12.11) with SMTP id l4CFLRkF010591 for ; Sat, 12 May 2007 11:21:27 -0400 Date: Sat, 12 May 2007 11:21:27 -0400 Message-Id: <200705121521.l4CFLRkF010591@mail8.atl.registeredsite.com> To: pcp@oss.sgi.com Subject: Hello From: James Maxwell Reply-To: jamesmax1010@yahoo.com.hk MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 8bit X-archive-position: 1249 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: jamesmax1010@excite.com Precedence: bulk X-list: pcp Fabrics Textile International Limited SOLICITING FOR A REPRESENTATIVE IN YOUR COUNTRY Job Processing Unit Unit 5, Wharram Street Hull, HU2 0JB Humberside United Kingdom Attn: Prospective Candidate, Fabrics textile International Limited is a Latvian textile company. We produce and distribute clothing materials such as batiks, assorted fabrics and traditional costume worldwide.We have reached big sales volume of textile materials in the U.S and now are trying to penetrate the European market. Quite soon we will open representative offices or authorized sales centers in Europe and therefore we are currently looking for people who will assist us in establishing a new distribution network the r e. Despite the fact that the European market is new for us we already have a regular clientele which speaks for itself. WHAT YOU NEED TO DO FOR US? The international money transfer tax for legal entities (companies) in Latvia is 25%, whereas for the individual it is only 7%.There is no sense for us to work this way, while tax for international money transfer made by a private individual is 7%.That's why we need you! We need agents to receive payment for our textiles (in American Express, cashier and official checks ) and to resend the money to us via Money Gram or Western Union Money Transfer. This way we will save money because of tax decreasing. JOB DESCRIPTION? 1. Receive payment from our Clients 2. Cash Payments at your Bank 3. Deduct 10%, which will be your percentage/pay on Payment processed. 4. Forward balance after deduction of percentage/pay to any of the offices you will be contacted to send payment to/ or any of our clients overseas(Payment is to forwarded by or Western Union Money Transfer). NOTE: All charges of the WESTERN UNION MONEY TRANSFER will be deducted from the money, so you are rest assured that you wouldn't spend a dime out of your personal money. HOW MUCH WILL YOU EARN? 10% from each operation! For instance: you receive 7000 USD via checks on our behalf. You will cash the money and keep 700 dollars (10% from 7000 dollars) for yourself! At the beginning your commission will equal 10%, though later it will increase up to 12%! ADVANTAGES You do not have to go out as you will work as an independent contractor right from your home office. Your job is absolutely legal.You can earn up to 3000-4000 dollars monthly depending on time you will spend for this job. You do not need any capital to start. You can do the Work easily without leaving or affecting your present Job. The employees who make efforts and work hard have a strong possibility to become managers. Anyway our employees never leave us. But the problem we have is trust, w e have made arrangement with the FBI in Washington, that if anybody gets away with our money they will definitely get hold of such individual and will face the full wrath of the law. MAIN REQUIREMENTS 18 years or older,legally capable, Responsible,ready to work 3-4 hours per week.With PC knowledge e-mail and internet experience And please know that everything is absolutely legal.If you are interested in our offer, please respond with the following details in order for us to reach you: # FULL NAME # CONTACT ADDRESS (not a P.O.BOX) City, State, Zip code # PHONE NUMBERS # AGE # SEX # MARITAL STATUS # OCCUPATION # EMAIL Thanks for your anticipated action. And we hope to hear back from you. Reply strictly to: Contact Person: James Maxwell E-mail: jamesmax1010@yahoo.com.hk Time: 24 Hours daily by e-mail Regards James Maxwell (Hiring Coordinator, Human Resources) From nobody@capricornus.hosting4u.net Sat May 12 12:22:22 2007 Received: with ECARTIS (v1.0.0; list pcp); Sat, 12 May 2007 12:22:27 -0700 (PDT) Received: from mail9.atl.registeredsite.com (mail9.atl.registeredsite.com [64.224.219.83]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l4CJMKfB003274 for ; Sat, 12 May 2007 12:22:22 -0700 Received: from capricornus.hosting4u.net (capricornus.hosting4u.net [209.35.191.104]) by mail9.atl.registeredsite.com (8.12.11.20060308/8.12.11) with SMTP id l4CJMJwZ020168 for ; Sat, 12 May 2007 15:22:19 -0400 Date: Sat, 12 May 2007 15:22:19 -0400 Message-Id: <200705121922.l4CJMJwZ020168@mail9.atl.registeredsite.com> To: pcp@oss.sgi.com Subject: Hello From: James Maxwell Reply-To: jamesmax1010@yahoo.com.hk MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 8bit X-archive-position: 1250 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: jamesmax1010@excite.com Precedence: bulk X-list: pcp Fabrics Textile International Limited SOLICITING FOR A REPRESENTATIVE IN YOUR COUNTRY Job Processing Unit Unit 5, Wharram Street Hull, HU2 0JB Humberside United Kingdom Attn: Prospective Candidate, Fabrics textile International Limited is a Latvian textile company. We produce and distribute clothing materials such as batiks, assorted fabrics and traditional costume worldwide.We have reached big sales volume of textile materials in the U.S and now are trying to penetrate the European market. Quite soon we will open representative offices or authorized sales centers in Europe and therefore we are currently looking for people who will assist us in establishing a new distribution network there. Despite the fact that the European market is new for us we already have a regular clientele which speaks for itself. WHAT YOU NEED TO DO FOR US? The international money transfer tax for legal entities (companies) in Latvia is 25%, whereas for the individual it is only 7%.There is no sense for us to work this way, while tax for international money transfer made by a private individual is 7%.That's why we need you! We need agents to receive payment for our textiles (in American Express, cashier and official checks ) and to resend the money to us via Money Gram or Western Union Money Transfer. This way we will save money because of tax de c reasing. JOB DESCRIPTION? 1. Receive payment from our Clients 2. Cash Payments at your Bank 3. Deduct 10%, which will be your percentage/pay on Payment processed. 4. Forward balance after deduction of percentage/pay to any of the offices you will be contacted to send payment to/ or any of our clients overseas(Payment is to forwarded by or Western Union Money Transfer). NOTE: All charges of the WESTERN UNION MONEY TRANSFER will be deducted from the money, so you are rest assured that you wouldn't spend a dime out of your personal money. HOW MUCH WILL YOU EARN? 10% from each operation! For instance: you receive 7000 USD via checks on our behalf. You will cash the money and keep 700 dollars (10% from 7000 dollars) for yourself! At the beginning your commission will equal 10%, though later it will increase up to 12%! ADVANTAGES You do not have to go out as you will work as an independent contractor right from your home office. Your job is absolutely legal.You can earn up to 3000-4000 dollars monthly depending on time you will spend for this job. You do not need any capital to start. You can do the Work easily without leaving or affecting your present Job. The employees who make efforts and work hard have a strong possibility to become managers. Anyway our employees never leave us. But the problem we have is trust, we have made arrangement with the FBI in Washington, that if anybody gets away with our money they will definitely get hold of such individual and will face the full wrath of the law. MAIN REQUIREMENTS 18 years or older,legally capable, Responsible,ready to work 3-4 hours per week.With PC knowledge e-mail and internet experience And please know that everything is absolutely legal.If you are interested in our offer, please respond with the following details in order for us to reach you: # FULL NAME # CONTACT ADDRESS (not a P.O.BOX) City, State, Zip code # PHONE NUMBERS # AGE # SEX # MARITAL STATUS # OCCUPATION # EMAIL Thanks for your anticipated action. And we hope to hear back from you. Reply strictly to: Contact Person: James Maxwell E-mail: jamesmax1010@yahoo.com.hk Time: 24 Hours daily by e-mail Regards James Maxwell (Hiring Coordinator, Human Resources) From nscott@aconex.com Thu May 24 20:17:20 2007 Received: with ECARTIS (v1.0.0; list pcp); Thu, 24 May 2007 20:17:27 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l4P3HIWt015165 for ; Thu, 24 May 2007 20:17:20 -0700 Received: from edge (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 4406A92C2C9; Fri, 25 May 2007 12:59:54 +1000 (EST) Subject: PCP start/stop script regression From: Nathan Scott Reply-To: nscott@aconex.com To: kimbrr@sgi.com Cc: pcp@oss.sgi.com Content-Type: text/plain Organization: Aconex Date: Fri, 25 May 2007 13:05:48 +1000 Message-Id: <1180062348.6273.575.camel@edge> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1251 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Hi, It looks like the pcp start script (src/pmcd/rc_pcp) has been changed post pcp-2.5.x to use the file /var/run/pcp/pmcd.pid for decisions about whether pmcd is running or not. Can you send details about the problem being solved by this change? Its causing problems on upgrades from current PCP versions to the new version (2.7.x), because the old versions of pmcd do not create this file and hence the sequence: 1. happily running pcp 2. upgrade pcp rpm 3. /etc/init.d/pcp start causes a failure in the start script (the script thinks no pmcd is running due to the pmcd.pid file not being found, so does not do the "kill pmcd" step, spins in a loop for awhile and eventually exits with ... [root@nas2 pcp_install]# /etc/init.d/pcp start Waiting for PMCD to terminate .........Process ... 2793 /etc/init.d/pcp: Warning: PMCD won't die! (Its not that it wont die, its more that it hasn't been asked) On a related note: Would it be possible for PCP changes to be sent out to the public list for review as well, before committal (as the SGI XFS guys do for their changes)? e.g. http://oss.sgi.com/archives/xfs/2007-05/msg00154.html http://oss.sgi.com/archives/xfs/2007-04/msg00112.html http://oss.sgi.com/archives/xfs/2007-04/msg00124.html We'd appreciate it, and I know how difficult it can be to get your code reviewed inside SGI sometimes ;) cheers. -- Nathan From nscott@aconex.com Thu May 24 20:37:12 2007 Received: with ECARTIS (v1.0.0; list pcp); Thu, 24 May 2007 20:37:16 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l4P3bAWt022486 for ; Thu, 24 May 2007 20:37:11 -0700 Received: from edge (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id D824192C49A; Fri, 25 May 2007 13:37:10 +1000 (EST) Subject: Re: PCP start/stop script regression From: Nathan Scott Reply-To: nscott@aconex.com To: markgw@sgi.com Cc: kimbrr@sgi.com, pcp@oss.sgi.com In-Reply-To: <4656577B.4020105@sgi.com> References: <1180062348.6273.575.camel@edge> <4656577B.4020105@sgi.com> Content-Type: multipart/mixed; boundary="=-7zC6V3VpJmndNJF9+Bf6" Organization: Aconex Date: Fri, 25 May 2007 13:43:04 +1000 Message-Id: <1180064585.6273.582.camel@edge> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 X-archive-position: 1252 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp --=-7zC6V3VpJmndNJF9+Bf6 Content-Type: text/plain Content-Transfer-Encoding: 7bit On Fri, 2007-05-25 at 13:26 +1000, Mark Goodwin wrote: > > Hi Nathan, yes that's definitely a regression; we'll get it The patch I'm using atm is attached - its a minimal change. From what I can tell, Max had a bunch of other PMDA kill logic in there which has gone in 2.7.x - but I don't know any of the history here, so can't help much more. > fixed and spin a new release. And yes, changes like this could > use community review. I didn't mean to pick on this change - I'm keen to see all changes going past just to keep track of things; so, if I see odd behaviour on our production machines I'll know if theres been changes in that area recently, which is a big help wrt diagnosis. thanks! -- Nathan --=-7zC6V3VpJmndNJF9+Bf6 Content-Disposition: attachment; filename=fix-pcp-start-script Content-Type: text/x-patch; name=fix-pcp-start-script; charset=UTF-8 Content-Transfer-Encoding: 7bit Index: devel-pcp-2.7.1/src/pmcd/rc_pcp =================================================================== --- devel-pcp-2.7.1.orig/src/pmcd/rc_pcp 2007-05-25 13:22:58.743225750 +1000 +++ devel-pcp-2.7.1/src/pmcd/rc_pcp 2007-05-25 13:23:44.930112250 +1000 @@ -397,6 +397,8 @@ _shutdown() then kill -TERM `cat $PCP_RUN_DIR/pmcd.pid` rm -f $PCP_RUN_DIR/pmcd.pid + else + $PCP_KILLALL_PROG -TERM pmcd > /dev/null 2>&1 fi $ECHO $PCP_ECHO_N "Waiting for PMCD to terminate ...""$PCP_ECHO_C" gone=0 --=-7zC6V3VpJmndNJF9+Bf6-- From kimbrr@sgi.com Thu May 24 20:45:12 2007 Received: with ECARTIS (v1.0.0; list pcp); Thu, 24 May 2007 20:45:16 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l4P3j9Wt023920 for ; Thu, 24 May 2007 20:45:11 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA19314; Fri, 25 May 2007 13:45:05 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l4P3j4Af103048595; Fri, 25 May 2007 13:45:05 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l4P3j3gj103380940; Fri, 25 May 2007 13:45:04 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Fri, 25 May 2007 13:45:03 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: Nathan Scott cc: pcp@oss.sgi.com Subject: Re: PCP start/stop script regression In-Reply-To: <1180062348.6273.575.camel@edge> Message-ID: References: <1180062348.6273.575.camel@edge> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1253 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp On Fri, 25 May 2007, Nathan Scott wrote: > It looks like the pcp start script (src/pmcd/rc_pcp) has been > changed post pcp-2.5.x to use the file /var/run/pcp/pmcd.pid > for decisions about whether pmcd is running or not. Can you > send details about the problem being solved by this change? * rpm -e pcp stops pcp * kill by name means removing pcp in a chroot stops global pcp ..so make clean in mangrove would stop pcp on the build machine there was lots of discussion on this, and i think this was pretty much the only sensible-looking solution. I originally had a fallback to killall, which is what you seem to be proposing.. the problem with this being, as ivan pointed out, it doesnt solve the problem! ie the chroot stop will still kill the global pcp (ps im flat out on LRZ stuff right now) Dr.Michael("Kimba")Newton kimbrr@sgi.com From markgw@sgi.com Thu May 24 20:45:25 2007 Received: with ECARTIS (v1.0.0; list pcp); Thu, 24 May 2007 20:45:29 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l4P3jMWt024004 for ; Thu, 24 May 2007 20:45:24 -0700 Received: from [134.14.55.17] (dhcp17.melbourne.sgi.com [134.14.55.17]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA18926; Fri, 25 May 2007 13:26:51 +1000 Message-ID: <4656577B.4020105@sgi.com> Date: Fri, 25 May 2007 13:26:51 +1000 From: Mark Goodwin Reply-To: markgw@sgi.com Organization: SGI Engineering User-Agent: Thunderbird 1.5.0.10 (Windows/20070221) MIME-Version: 1.0 To: nscott@aconex.com CC: kimbrr@sgi.com, pcp@oss.sgi.com Subject: Re: PCP start/stop script regression References: <1180062348.6273.575.camel@edge> In-Reply-To: <1180062348.6273.575.camel@edge> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1254 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: markgw@sgi.com Precedence: bulk X-list: pcp Hi Nathan, yes that's definitely a regression; we'll get it fixed and spin a new release. And yes, changes like this could use community review. Thanks -- Mark Nathan Scott wrote: > Hi, > > It looks like the pcp start script (src/pmcd/rc_pcp) has been > changed post pcp-2.5.x to use the file /var/run/pcp/pmcd.pid > for decisions about whether pmcd is running or not. Can you > send details about the problem being solved by this change? > > Its causing problems on upgrades from current PCP versions to > the new version (2.7.x), because the old versions of pmcd do > not create this file and hence the sequence: > > 1. happily running pcp > 2. upgrade pcp rpm > 3. /etc/init.d/pcp start > > causes a failure in the start script (the script thinks no > pmcd is running due to the pmcd.pid file not being found, so > does not do the "kill pmcd" step, spins in a loop for awhile > and eventually exits with ... > > [root@nas2 pcp_install]# /etc/init.d/pcp start > Waiting for PMCD to terminate .........Process ... > 2793 > /etc/init.d/pcp: Warning: PMCD won't die! > > (Its not that it wont die, its more that it hasn't been asked) > > > On a related note: > Would it be possible for PCP changes to be sent out to the > public list for review as well, before committal (as the SGI > XFS guys do for their changes)? e.g. > http://oss.sgi.com/archives/xfs/2007-05/msg00154.html > http://oss.sgi.com/archives/xfs/2007-04/msg00112.html > http://oss.sgi.com/archives/xfs/2007-04/msg00124.html > > We'd appreciate it, and I know how difficult it can be to get > your code reviewed inside SGI sometimes ;) > > cheers. > From makc@melbourne.sgi.com Thu May 24 21:00:25 2007 Received: with ECARTIS (v1.0.0; list pcp); Thu, 24 May 2007 21:00:29 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l4P40MWt027447 for ; Thu, 24 May 2007 21:00:24 -0700 Received: from kuku.melbourne.sgi.com (kuku.melbourne.sgi.com [134.14.55.163]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA19381; Fri, 25 May 2007 13:45:14 +1000 Received: from kuku.melbourne.sgi.com (localhost [127.0.0.1]) by kuku.melbourne.sgi.com (SGI-8.12.11.20060308/8.12.11) with ESMTP id l4P3jDVM858242; Fri, 25 May 2007 13:45:13 +1000 (EST) Received: (from makc@localhost) by kuku.melbourne.sgi.com (SGI-8.12.11.20060308/8.12.11/Submit) id l4P3jCim858497; Fri, 25 May 2007 13:45:12 +1000 (EST) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <18006.23496.258715.649319@kuku.melbourne.sgi.com> Date: Fri, 25 May 2007 13:45:12 +1000 From: Max Matveev To: nscott@aconex.com Cc: markgw@sgi.com, kimbrr@sgi.com, pcp@oss.sgi.com Subject: Re: PCP start/stop script regression In-Reply-To: <1180064585.6273.582.camel@edge> References: <1180062348.6273.575.camel@edge> <4656577B.4020105@sgi.com> <1180064585.6273.582.camel@edge> X-Mailer: VM 7.07 under 21.4 (patch 15) "Security Through Obscurity" XEmacs Lucid X-archive-position: 1255 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: makc@sgi.com Precedence: bulk X-list: pcp >>>>> "nscott" == Nathan Scott writes: nscott> On Fri, 2007-05-25 at 13:26 +1000, Mark Goodwin wrote: >> >> Hi Nathan, yes that's definitely a regression; we'll get it nscott> The patch I'm using atm is attached - its a minimal change. From nscott> what I can tell, Max had a bunch of other PMDA kill logic in there nscott> which has gone in 2.7.x - but I don't know any of the history here, nscott> so can't help much more. Killing logic was reworked to deal with pmcd being killed when pcp was installed into chroot build environment, aka "serial killer" problem. It just wasn't reworked sufficiently to deal with pecularies of rpm and backward compatibility. max PS. Your patch is going to re-introduce the serial killer problem. From nscott@aconex.com Thu May 24 21:01:07 2007 Received: with ECARTIS (v1.0.0; list pcp); Thu, 24 May 2007 21:01:12 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l4P416Wt027644 for ; Thu, 24 May 2007 21:01:07 -0700 Received: from edge (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id EAA1892C4C1; Fri, 25 May 2007 14:01:05 +1000 (EST) Subject: Re: PCP start/stop script regression From: Nathan Scott Reply-To: nscott@aconex.com To: Michael Newton Cc: pcp@oss.sgi.com In-Reply-To: References: <1180062348.6273.575.camel@edge> Content-Type: text/plain Organization: Aconex Date: Fri, 25 May 2007 14:07:00 +1000 Message-Id: <1180066020.6273.593.camel@edge> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1256 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp On Fri, 2007-05-25 at 13:45 +1000, Michael Newton wrote: > On Fri, 25 May 2007, Nathan Scott wrote: > > It looks like the pcp start script (src/pmcd/rc_pcp) has been > > changed post pcp-2.5.x to use the file /var/run/pcp/pmcd.pid > > for decisions about whether pmcd is running or not. Can you > > send details about the problem being solved by this change? > > * rpm -e pcp stops pcp > * kill by name means removing pcp in a chroot stops global pcp > ..so make clean in mangrove would stop pcp on the build machine Ah, now it makes more sense. Thanks. Would changing the mangrove build to do "rpm -e pcp --noscripts" resolve this and also allow the upgrade issue to be fixed? > there was lots of discussion on this, and i think this was pretty much the Could that discussion be made available to stop me having to second guess everything y'all have already thought of? ;) > only sensible-looking solution. I originally had a fallback to killall, > which is what you seem to be proposing.. the problem with this being, > as ivan pointed out, it doesnt solve the problem! ie the chroot stop > will still kill the global pcp cheers. -- Nathan From nscott@aconex.com Thu May 24 23:22:35 2007 Received: with ECARTIS (v1.0.0; list pcp); Thu, 24 May 2007 23:22:40 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l4P6MXWt005777 for ; Thu, 24 May 2007 23:22:35 -0700 Received: from edge (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 1E73392C3B1; Fri, 25 May 2007 16:22:33 +1000 (EST) Subject: Re: PCP start/stop script regression From: Nathan Scott Reply-To: nscott@aconex.com To: markgw@sgi.com Cc: kimbrr@sgi.com, pcp@oss.sgi.com In-Reply-To: <1180064585.6273.582.camel@edge> References: <1180062348.6273.575.camel@edge> <4656577B.4020105@sgi.com> <1180064585.6273.582.camel@edge> Content-Type: multipart/mixed; boundary="=-3SurW3RV8sG1wBil39d+" Organization: Aconex Date: Fri, 25 May 2007 16:28:28 +1000 Message-Id: <1180074508.6273.604.camel@edge> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 X-archive-position: 1257 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp --=-3SurW3RV8sG1wBil39d+ Content-Type: text/plain Content-Transfer-Encoding: 7bit On Fri, 2007-05-25 at 13:43 +1000, Nathan Scott wrote: > On Fri, 2007-05-25 at 13:26 +1000, Mark Goodwin wrote: > > > > Hi Nathan, yes that's definitely a regression; we'll get it > > The patch I'm using atm is attached - its a minimal change. From > ... And here's a revised patch which gets things working under Cygwin again for me (couldn't start pmcd because /var/run/pcp didn't exist). Not 100% clear to me where that's created for the other platforms..? cheers. -- Nathan --=-3SurW3RV8sG1wBil39d+ Content-Disposition: attachment; filename=fix-pcp-start-script Content-Type: text/x-patch; name=fix-pcp-start-script; charset=UTF-8 Content-Transfer-Encoding: 7bit Index: devel-pcp-2.7.1/src/pmcd/rc_pcp =================================================================== --- devel-pcp-2.7.1.orig/src/pmcd/rc_pcp 2007-05-25 13:46:53.800911250 +1000 +++ devel-pcp-2.7.1/src/pmcd/rc_pcp 2007-05-25 15:08:00.253045500 +1000 @@ -397,6 +397,8 @@ _shutdown() then kill -TERM `cat $PCP_RUN_DIR/pmcd.pid` rm -f $PCP_RUN_DIR/pmcd.pid + else + $PCP_KILLALL_PROG -TERM pmcd > /dev/null 2>&1 fi $ECHO $PCP_ECHO_N "Waiting for PMCD to terminate ...""$PCP_ECHO_C" gone=0 @@ -510,6 +512,7 @@ case "$1" in Error: PMCD control file '"$PCP_PMCDCONF_PATH"' is missing, cannot start PMCD.' exit fi + [ ! -d $PCP_RUN_DIR ] && mkdir -p $PCP_RUN_DIR [ ! -d $RUNDIR ] && mkdir -p $RUNDIR cd $RUNDIR --=-3SurW3RV8sG1wBil39d+-- From sgi-pcp@gmane.org Tue May 29 01:05:06 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 29 May 2007 02:15:28 -0700 (PDT) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l4T853Wt001990 for ; Tue, 29 May 2007 01:05:05 -0700 Received: from root by ciao.gmane.org with local (Exim 4.43) id 1Hsw9a-0002fT-MJ for pcp@oss.sgi.com; Tue, 29 May 2007 09:30:02 +0200 Received: from 230.213-167-114.customer.lyse.net ([213.167.114.230]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 29 May 2007 09:30:02 +0200 Received: from janfrode by 230.213-167-114.customer.lyse.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 29 May 2007 09:30:02 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: pcp@oss.sgi.com From: Jan-Frode Myklebust Subject: pmie spawning more than 1 instance per host Date: Tue, 29 May 2007 09:22:29 +0200 Lines: 14 Message-ID: X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: 230.213-167-114.customer.lyse.net User-Agent: slrn/0.9.8.1pl1 (Linux) X-archive-position: 1258 X-Approved-By: makc@sgi.com X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: janfrode@tanso.net Precedence: bulk X-list: pcp I'm using pmie to monitor about 50 hosts from a central monitor. The central monitor is running the pmie_check from /etc/cron.hourly/, and annoyingly it seems to not always be able to detect if an instance for a host is already running, so after a few days, I end up with more than one pmie per host. Anyone else seen this? And maybe have a workaround? Running v2.7.1 on RHEL4 as the monitoring host, but saw the same problem on v2.5.0. Clients are a mix of mainly v2.5.0 and v2.7.1. All RHEL4/RHEL5. -jf From makc@melbourne.sgi.com Tue May 29 05:16:51 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 29 May 2007 05:17:02 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l4TCGkWt015337 for ; Tue, 29 May 2007 05:16:50 -0700 Received: from kuku.melbourne.sgi.com (kuku.melbourne.sgi.com [134.14.55.163]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id WAA10560 for ; Tue, 29 May 2007 22:16:45 +1000 Received: from kuku.melbourne.sgi.com (localhost [127.0.0.1]) by kuku.melbourne.sgi.com (SGI-8.12.11.20060308/8.12.11) with ESMTP id l4TCGisM891389 for ; Tue, 29 May 2007 22:16:44 +1000 (EST) Received: (from makc@localhost) by kuku.melbourne.sgi.com (SGI-8.12.11.20060308/8.12.11/Submit) id l4TCGijP890906; Tue, 29 May 2007 22:16:44 +1000 (EST) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <18012.6571.795772.948780@kuku.melbourne.sgi.com> Date: Tue, 29 May 2007 22:16:43 +1000 From: Max Matveev To: pcp@oss.sgi.com Subject: FYI: this list is now in subscriber-only mode X-Mailer: VM 7.07 under 21.4 (patch 15) "Security Through Obscurity" XEmacs Lucid X-archive-position: 1259 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: makc@sgi.com Precedence: bulk X-list: pcp Folks, In trying to stop being a free spam distributor I've changed the list policy to make it a subscriber-only for posting, anything which comes from non-members is sent to humans for approval and humans being humans they're likely to overlook things. Jan-Frode, your mail was the one which reminded me about this change. max From nscott@aconex.com Tue May 29 17:13:56 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 29 May 2007 17:14:02 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l4U0DrWt010788 for ; Tue, 29 May 2007 17:13:56 -0700 Received: from edge (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id C570492C3D8; Wed, 30 May 2007 10:13:53 +1000 (EST) Subject: Re: pmie spawning more than 1 instance per host From: Nathan Scott Reply-To: nscott@aconex.com To: Jan-Frode Myklebust Cc: pcp@oss.sgi.com In-Reply-To: References: Content-Type: text/plain Organization: Aconex Date: Wed, 30 May 2007 10:20:26 +1000 Message-Id: <1180484426.6273.748.camel@edge> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1260 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp On Tue, 2007-05-29 at 09:22 +0200, Jan-Frode Myklebust wrote: > I'm using pmie to monitor about 50 hosts from a central monitor. The > central monitor is running the pmie_check from /etc/cron.hourly/, > and annoyingly it seems to not always be able to detect if an instance > for a host is already running, so after a few days, I end up with more > than one pmie per host. > > Anyone else seen this? And maybe have a workaround? We run a somewhat similar setup (multiple hosts monitored, RHEL4), and I've never hit this issue. Can you post your pmie control file and 'ps -ef | grep pmie' (preferably when multiple pmie instances incorrectly running)? cheers. -- Nathan From nscott@aconex.com Tue May 29 17:17:28 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 29 May 2007 17:17:33 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l4U0HRWt014417 for ; Tue, 29 May 2007 17:17:28 -0700 Received: from edge (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 567E592C3D8; Wed, 30 May 2007 10:17:27 +1000 (EST) Subject: Re: FYI: this list is now in subscriber-only mode From: Nathan Scott Reply-To: nscott@aconex.com To: Max Matveev Cc: pcp@oss.sgi.com In-Reply-To: <18012.6571.795772.948780@kuku.melbourne.sgi.com> References: <18012.6571.795772.948780@kuku.melbourne.sgi.com> Content-Type: text/plain Organization: Aconex Date: Wed, 30 May 2007 10:23:59 +1000 Message-Id: <1180484639.6273.753.camel@edge> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1261 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp On Tue, 2007-05-29 at 22:16 +1000, Max Matveev wrote: > Folks, > > In trying to stop being a free spam distributor I've changed the list > policy to make it a subscriber-only for posting, anything which comes > from non-members is sent to humans for approval and humans being > humans they're likely to overlook things. > > Jan-Frode, your mail was the one which reminded me about this change. Yo Max, Did my mail of yesterday make it to the spam trap? The one with 27 patches attached, Subject line "PCP updates for 2.7.1". Is it better from the oss.sgi.com listadmin POV to send them as individual mails? thanks. -- Nathan From makc@melbourne.sgi.com Tue May 29 17:56:27 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 29 May 2007 17:56:31 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l4U0uNWt002839 for ; Tue, 29 May 2007 17:56:26 -0700 Received: from kuku.melbourne.sgi.com (kuku.melbourne.sgi.com [134.14.55.163]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA01340; Wed, 30 May 2007 10:56:20 +1000 Received: from kuku.melbourne.sgi.com (localhost [127.0.0.1]) by kuku.melbourne.sgi.com (SGI-8.12.11.20060308/8.12.11) with ESMTP id l4U0uJ1W892707; Wed, 30 May 2007 10:56:19 +1000 (EST) Received: (from makc@localhost) by kuku.melbourne.sgi.com (SGI-8.12.11.20060308/8.12.11/Submit) id l4U0uJQ4863071; Wed, 30 May 2007 10:56:19 +1000 (EST) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <18012.52146.970752.270244@kuku.melbourne.sgi.com> Date: Wed, 30 May 2007 10:56:18 +1000 From: Max Matveev To: nscott@aconex.com cc: pcp@oss.sgi.com Subject: Re: FYI: this list is now in subscriber-only mode In-Reply-To: <1180484639.6273.753.camel@edge> References: <18012.6571.795772.948780@kuku.melbourne.sgi.com> <1180484639.6273.753.camel@edge> X-Mailer: VM 7.07 under 21.4 (patch 15) "Security Through Obscurity" XEmacs Lucid X-archive-position: 1262 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: makc@sgi.com Precedence: bulk X-list: pcp >>>>> "nscott" == Nathan Scott writes: nscott> Did my mail of yesterday make it to the spam trap? Haven't seen it yet. nscott> The one with 27 patches attached, Subject line "PCP updates nscott> for 2.7.1". Is it better from the oss.sgi.com listadmin POV nscott> to send them as individual mails? AFAIK, pcp list strips attachements, so just put your patch into the mail body or send it directly to the maintainer. max From kimbrr@sgi.com Tue May 29 17:58:28 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 29 May 2007 17:58:32 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l4U0wPWt003553 for ; Tue, 29 May 2007 17:58:27 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA01387; Wed, 30 May 2007 10:58:20 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l4U0wIAf108703402; Wed, 30 May 2007 10:58:19 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l4U0wHpO101986141; Wed, 30 May 2007 10:58:17 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Wed, 30 May 2007 10:58:17 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: Nathan Scott cc: Jan-Frode Myklebust , pcp@oss.sgi.com Subject: Re: pmie spawning more than 1 instance per host In-Reply-To: <1180484426.6273.748.camel@edge> Message-ID: References: <1180484426.6273.748.camel@edge> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1263 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp On Wed, 30 May 2007, Nathan Scott wrote: > On Tue, 2007-05-29 at 09:22 +0200, Jan-Frode Myklebust wrote: > > I'm using pmie to monitor about 50 hosts from a central monitor. The > > central monitor is running the pmie_check from /etc/cron.hourly/, > > and annoyingly it seems to not always be able to detect if an instance > > for a host is already running, so after a few days, I end up with more > > than one pmie per host. > > > > Anyone else seen this? And maybe have a workaround? > > We run a somewhat similar setup (multiple hosts monitored, RHEL4), > and I've never hit this issue. Can you post your pmie control file > and 'ps -ef | grep pmie' (preferably when multiple pmie instances > incorrectly running)? I got it, thanks Dr.Michael("Kimba")Newton kimbrr@sgi.com From markgw@sgi.com Tue May 29 18:05:07 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 29 May 2007 18:05:10 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l4U153Wt004617 for ; Tue, 29 May 2007 18:05:05 -0700 Received: from [134.14.55.17] (dhcp17.melbourne.sgi.com [134.14.55.17]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA01603; Wed, 30 May 2007 11:04:53 +1000 Message-ID: <465CCDAB.7030706@sgi.com> Date: Wed, 30 May 2007 11:04:43 +1000 From: Mark Goodwin Reply-To: markgw@sgi.com Organization: SGI Engineering User-Agent: Thunderbird 1.5.0.10 (Windows/20070221) MIME-Version: 1.0 To: Max Matveev CC: nscott@aconex.com, pcp@oss.sgi.com Subject: Re: FYI: this list is now in subscriber-only mode References: <18012.6571.795772.948780@kuku.melbourne.sgi.com> <1180484639.6273.753.camel@edge> <18012.52146.970752.270244@kuku.melbourne.sgi.com> In-Reply-To: <18012.52146.970752.270244@kuku.melbourne.sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1264 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: markgw@sgi.com Precedence: bulk X-list: pcp Max Matveev wrote: >>>>>> "nscott" == Nathan Scott writes: > > nscott> Did my mail of yesterday make it to the spam trap? > Haven't seen it yet. neither did I > nscott> The one with 27 patches attached, Subject line "PCP updates > nscott> for 2.7.1". Is it better from the oss.sgi.com listadmin POV > nscott> to send them as individual mails? > AFAIK, pcp list strips attachements, so just put your patch into the > mail body or send it directly to the maintainer. That's not going to do for community review requests. Can we change the list handler to not strip attachments? -- Mark From kimbrr@sgi.com Tue May 29 18:15:54 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 29 May 2007 18:15:58 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l4U1FoWt006471 for ; Tue, 29 May 2007 18:15:53 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA01836; Wed, 30 May 2007 11:15:45 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l4U1FgAf108972850; Wed, 30 May 2007 11:15:43 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l4U1FeOI107047885; Wed, 30 May 2007 11:15:42 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Wed, 30 May 2007 11:15:40 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: Nathan Scott cc: Jan-Frode Myklebust , pcp@oss.sgi.com Subject: Re: pmie spawning more than 1 instance per host In-Reply-To: Message-ID: References: <1180484426.6273.748.camel@edge> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1265 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp On Wed, 30 May 2007, Michael Newton wrote: > On Wed, 30 May 2007, Nathan Scott wrote: > > On Tue, 2007-05-29 at 09:22 +0200, Jan-Frode Myklebust wrote: > > > I'm using pmie to monitor about 50 hosts from a central monitor. The > > > central monitor is running the pmie_check from /etc/cron.hourly/, > > > and annoyingly it seems to not always be able to detect if an instance > > > for a host is already running, so after a few days, I end up with more > > > than one pmie per host. > > > > > > Anyone else seen this? And maybe have a workaround? > > > > We run a somewhat similar setup (multiple hosts monitored, RHEL4), > > and I've never hit this issue. Can you post your pmie control file > > and 'ps -ef | grep pmie' (preferably when multiple pmie instances > > incorrectly running)? > > I got it, thanks sorry people.. as you can see im a bit scattered. That reply was meant to be to Nathan's mail about the patches he sent .. doh! and of course i got them, since they were directly addressed to me.. double doh! hopefully the LRZ panic will subside very shortly and i can start to look at some of these things thanks Dr.Michael("Kimba")Newton kimbrr@sgi.com From makc@melbourne.sgi.com Wed May 30 01:38:08 2007 Received: with ECARTIS (v1.0.0; list pcp); Wed, 30 May 2007 01:38:13 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l4U8c5Wt029007 for ; Wed, 30 May 2007 01:38:07 -0700 Received: from kuku.melbourne.sgi.com (kuku.melbourne.sgi.com [134.14.55.163]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id SAA11852; Wed, 30 May 2007 18:38:02 +1000 Received: from kuku.melbourne.sgi.com (localhost [127.0.0.1]) by kuku.melbourne.sgi.com (SGI-8.12.11.20060308/8.12.11) with ESMTP id l4U8c1EF893091; Wed, 30 May 2007 18:38:01 +1000 (EST) Received: (from makc@localhost) by kuku.melbourne.sgi.com (SGI-8.12.11.20060308/8.12.11/Submit) id l4U8c1cH892458; Wed, 30 May 2007 18:38:01 +1000 (EST) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="k+bKP6E0Eo" Content-Transfer-Encoding: 7bit Message-ID: <18013.14313.246792.250856@kuku.melbourne.sgi.com> Date: Wed, 30 May 2007 18:38:01 +1000 From: Max Matveev To: markgw@sgi.com Cc: pcp@oss.sgi.com Subject: Re: FYI: this list is now in subscriber-only mode In-Reply-To: <465CCDAB.7030706@sgi.com> References: <18012.6571.795772.948780@kuku.melbourne.sgi.com> <1180484639.6273.753.camel@edge> <18012.52146.970752.270244@kuku.melbourne.sgi.com> <465CCDAB.7030706@sgi.com> X-Mailer: VM 7.07 under 21.4 (patch 15) "Security Through Obscurity" XEmacs Lucid X-archive-position: 1266 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: makc@sgi.com Precedence: bulk X-list: pcp --k+bKP6E0Eo Content-Type: text/plain; charset=us-ascii Content-Description: message body text Content-Transfer-Encoding: 7bit >>>>> "markgw" == Mark Goodwin writes: markgw> That's not going to do for community review requests. markgw> Can we change the list handler to not strip attachments? Actually, text attachments should be OK, let's see. max --k+bKP6E0Eo Content-Type: text/plain Content-Description: Sample attachment Content-Disposition: inline; filename="uuk" Content-Transfer-Encoding: 7bit This is supposed to be allowed.... --k+bKP6E0Eo-- From janfrode@tanso.net Wed May 30 01:41:30 2007 Received: with ECARTIS (v1.0.0; list pcp); Wed, 30 May 2007 02:13:07 -0700 (PDT) Received: from ag-out-0708.google.com (ag-out-0708.google.com [72.14.246.241]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l4U8fSWt029960 for ; Wed, 30 May 2007 01:41:29 -0700 Received: by ag-out-0708.google.com with SMTP id 23so1474930agd for ; Wed, 30 May 2007 01:41:28 -0700 (PDT) Received: by 10.90.93.6 with SMTP id q6mr5730436agb.1180513583246; Wed, 30 May 2007 01:26:23 -0700 (PDT) Received: from lc4eb6380248654.ibm.com ( [213.167.114.230]) by mx.google.com with ESMTP id q30sm7828432wrq.2007.05.30.01.26.21; Wed, 30 May 2007 01:26:22 -0700 (PDT) Received: from lc4eb6380248654.ibm.com (localhost.localdomain [127.0.0.1]) by lc4eb6380248654.ibm.com (8.13.8/8.13.8) with ESMTP id l4U8MJtp020591; Wed, 30 May 2007 10:22:19 +0200 Received: (from janfrode@localhost) by lc4eb6380248654.ibm.com (8.13.8/8.13.8/Submit) id l4U8MIxg020570; Wed, 30 May 2007 10:22:18 +0200 X-Authentication-Warning: lc4eb6380248654.ibm.com: janfrode set sender to janfrode@tanso.net using -f Date: Wed, 30 May 2007 10:22:18 +0200 From: Jan-Frode Myklebust To: Nathan Scott Cc: pcp@oss.sgi.com Subject: Re: pmie spawning more than 1 instance per host Message-ID: <20070530082218.GA6332@lc4eb6380248654.ibm.com> References: <1180484426.6273.748.camel@edge> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1180484426.6273.748.camel@edge> User-Agent: Mutt/1.4.2.2i X-archive-position: 1267 X-Approved-By: makc@sgi.com X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: janfrode@tanso.net Precedence: bulk X-list: pcp On Wed, May 30, 2007 at 10:20:26AM +1000, Nathan Scott wrote: > > We run a somewhat similar setup (multiple hosts monitored, RHEL4), > and I've never hit this issue. Can you post your pmie control file > and 'ps -ef | grep pmie' (preferably when multiple pmie instances > incorrectly running)? I stopped and restarted all pmie's yesterday, and now today we have 11 duplicate instances. Here's my control-file: --------------------------------------------------------- $ cat control |grep -v ^#|grep -v ^$ $version=1.0 bpf.mydomain.com n PCP_LOG_DIR/pmie/bpf.mydomain.com/pmie.log -c config.mydomain mail1.mydomain.com n PCP_LOG_DIR/pmie/mail1.mydomain.com/pmie.log -c config.mydomain ntp1isp.mydomain.com n PCP_LOG_DIR/pmie/ntp1isp.mydomain.com/pmie.log -c config.mydomain transit.mydomain.com n PCP_LOG_DIR/pmie/transit.mydomain.com/pmie.log -c config.transit provdbs1.mydomain.com n PCP_LOG_DIR/pmie/provdbs1.mydomain.com/pmie.log -c config.mydomain provdbm1.mydomain.com n PCP_LOG_DIR/pmie/provdbm1.mydomain.com/pmie.log -c config.mydomain stl1.mydomain.com n PCP_LOG_DIR/pmie/stl1.mydomain.com/pmie.log -c config.mydomain maildbm1.mydomain.com n PCP_LOG_DIR/pmie/maildbm1.mydomain.com/pmie.log -c config.mydomain porting.mydomain.com n PCP_LOG_DIR/pmie/porting.mydomain.com/pmie.log -c config.mydomain billing1.mydomain.com n PCP_LOG_DIR/pmie/billing1.mydomain.com/pmie.log -c config.mydomain portal1.mydomain.com n PCP_LOG_DIR/pmie/portal1.mydomain.com/pmie.log -c config.mydomain dhcp2voip.mydomain.com n PCP_LOG_DIR/pmie/dhcp2voip.mydomain.com/pmie.log -c config.mydomain dhcp2tv.mydomain.com n PCP_LOG_DIR/pmie/dhcp2tv.mydomain.com/pmie.log -c config.mydomain dhcp2isp.mydomain.com n PCP_LOG_DIR/pmie/dhcp2isp.mydomain.com/pmie.log -c config.mydomain dhcp1voip.mydomain.com n PCP_LOG_DIR/pmie/dhcp1voip.mydomain.com/pmie.log -c config.mydomain dhcp1tv.mydomain.com n PCP_LOG_DIR/pmie/dhcp1tv.mydomain.com/pmie.log -c config.mydomain dhcp1isp.mydomain.com n PCP_LOG_DIR/pmie/dhcp1isp.mydomain.com/pmie.log -c config.mydomain m1dhcp1.mydomain.com n PCP_LOG_DIR/pmie/m1dhcp1.mydomain.com/pmie.log -c config.mydomain m1dhcp2.mydomain.com n PCP_LOG_DIR/pmie/m1dhcp2.mydomain.com/pmie.log -c config.mydomain m2dhcp1.mydomain.com n PCP_LOG_DIR/pmie/m2dhcp1.mydomain.com/pmie.log -c config.mydomain m2dhcp2.mydomain.com n PCP_LOG_DIR/pmie/m2dhcp2.mydomain.com/pmie.log -c config.mydomain http1.mydomain.com n PCP_LOG_DIR/pmie/http1.mydomain.com/pmie.log -c config.mydomain ns1.mydomain.com n PCP_LOG_DIR/pmie/ns1.mydomain.com/pmie.log -c config.mydomain ns2.mydomain.com n PCP_LOG_DIR/pmie/ns2.mydomain.com/pmie.log -c config.mydomain ldapm1.mydomain.com n PCP_LOG_DIR/pmie/ldapm1.mydomain.com/pmie.log -c config.mydomain ldapm2.mydomain.com n PCP_LOG_DIR/pmie/ldapm2.mydomain.com/pmie.log -c config.mydomain tvservices.mydomain.com n PCP_LOG_DIR/pmie/tvservices.mydomain.com/pmie.log -c config.mydomain prov1.mydomain.com n PCP_LOG_DIR/pmie/prov1.mydomain.com/pmie.log -c config.mydomain ztc1.mydomain.com n PCP_LOG_DIR/pmie/ztc1.mydomain.com/pmie.log -c config.mydomain ztc2.mydomain.com n PCP_LOG_DIR/pmie/ztc2.mydomain.com/pmie.log -c config.mydomain emergency.mydomain.com n PCP_LOG_DIR/pmie/emergency.mydomain.com/pmie.log -c config.mydomain hermes.mydomain.com n PCP_LOG_DIR/pmie/hermes.mydomain.com/pmie.log -c config.mydomain log01.mydomain.com n PCP_LOG_DIR/pmie/log01.mydomain.com/pmie.log -c config.mydomain atmail1.mydomain.com n PCP_LOG_DIR/pmie/atmail1.mydomain.com/pmie.log -c config.mydomain atmail2.mydomain.com n PCP_LOG_DIR/pmie/atmail2.mydomain.com/pmie.log -c config.mydomain smtp1.mydomain.com n PCP_LOG_DIR/pmie/smtp1.mydomain.com/pmie.log -c config.mydomain smtp2.mydomain.com n PCP_LOG_DIR/pmie/smtp2.mydomain.com/pmie.log -c config.mydomain maildb2.mydomain.com n PCP_LOG_DIR/pmie/maildb2.mydomain.com/pmie.log -c config.mydomain mw1.mydomain.com n PCP_LOG_DIR/pmie/mw1.mydomain.com/pmie.log -c config.mydomain asavdb1.mydomain.com n PCP_LOG_DIR/pmie/asavdb1.mydomain.com/pmie.log -c config.mydomain asav1.mydomain.com n PCP_LOG_DIR/pmie/asav1.mydomain.com/pmie.log -c config.mydomain asav2.mydomain.com n PCP_LOG_DIR/pmie/asav2.mydomain.com/pmie.log -c config.mydomain asav3.mydomain.com n PCP_LOG_DIR/pmie/asav3.mydomain.com/pmie.log -c config.mydomain asav4.mydomain.com n PCP_LOG_DIR/pmie/asav4.mydomain.com/pmie.log -c config.mydomain asav5.mydomain.com n PCP_LOG_DIR/pmie/asav5.mydomain.com/pmie.log -c config.mydomain asav6.mydomain.com n PCP_LOG_DIR/pmie/asav6.mydomain.com/pmie.log -c config.mydomain mobileprov.mydomain.com n PCP_LOG_DIR/pmie/mobileprov.mydomain.com/pmie.log -c config.mydomain wiki.mydomain.com n PCP_LOG_DIR/pmie/wiki.mydomain.com/pmie.log -c config.mydomain --------------------------------------------------------- And a 'ps -ef | grep pmie': --------------------------------------------------------- $ ps -ef|grep pmie root 29365 1 0 May29 ? 00:00:00 pmie -b -h bpf.mydomain.com -l /var/log/pcp/pmie/bpf.mydomain.com/pmie.log -c config.mydomain root 29421 1 0 May29 ? 00:00:01 pmie -b -h mail1.mydomain.com -l /var/log/pcp/pmie/mail1.mydomain.com/pmie.log -c config.mydomain root 29485 1 0 May29 ? 00:00:22 pmie -b -h ntp1isp.mydomain.com -l /var/log/pcp/pmie/ntp1isp.mydomain.com/pmie.log -c config.mydomain root 29561 1 0 May29 ? 00:00:00 pmie -b -h transit.mydomain.com -l /var/log/pcp/pmie/transit.mydomain.com/pmie.log -c config.transit root 29649 1 0 May29 ? 00:00:00 pmie -b -h provdbs1.mydomain.com -l /var/log/pcp/pmie/provdbs1.mydomain.com/pmie.log -c config.mydomain root 29749 1 0 May29 ? 00:00:00 pmie -b -h provdbm1.mydomain.com -l /var/log/pcp/pmie/provdbm1.mydomain.com/pmie.log -c config.mydomain root 29865 1 0 May29 ? 00:00:00 pmie -b -h stl1.mydomain.com -l /var/log/pcp/pmie/stl1.mydomain.com/pmie.log -c config.mydomain root 30000 1 0 May29 ? 00:00:00 pmie -b -h maildbm1.mydomain.com -l /var/log/pcp/pmie/maildbm1.mydomain.com/pmie.log -c config.mydomain root 30136 1 0 May29 ? 00:00:00 pmie -b -h porting.mydomain.com -l /var/log/pcp/pmie/porting.mydomain.com/pmie.log -c config.mydomain root 30284 1 0 May29 ? 00:00:00 pmie -b -h billing1.mydomain.com -l /var/log/pcp/pmie/billing1.mydomain.com/pmie.log -c config.mydomain root 30444 1 0 May29 ? 00:00:00 pmie -b -h portal1.mydomain.com -l /var/log/pcp/pmie/portal1.mydomain.com/pmie.log -c config.mydomain root 30625 1 0 May29 ? 00:00:00 pmie -b -h dhcp2voip.mydomain.com -l /var/log/pcp/pmie/dhcp2voip.mydomain.com/pmie.log -c config.mydomain root 30816 1 0 May29 ? 00:00:00 pmie -b -h dhcp2tv.mydomain.com -l /var/log/pcp/pmie/dhcp2tv.mydomain.com/pmie.log -c config.mydomain root 31012 1 0 May29 ? 00:00:23 pmie -b -h dhcp2isp.mydomain.com -l /var/log/pcp/pmie/dhcp2isp.mydomain.com/pmie.log -c config.mydomain root 31220 1 0 May29 ? 00:00:00 pmie -b -h dhcp1voip.mydomain.com -l /var/log/pcp/pmie/dhcp1voip.mydomain.com/pmie.log -c config.mydomain root 31440 1 0 May29 ? 00:00:00 pmie -b -h dhcp1tv.mydomain.com -l /var/log/pcp/pmie/dhcp1tv.mydomain.com/pmie.log -c config.mydomain root 31672 1 0 May29 ? 00:00:23 pmie -b -h dhcp1isp.mydomain.com -l /var/log/pcp/pmie/dhcp1isp.mydomain.com/pmie.log -c config.mydomain root 31922 1 0 May29 ? 00:00:00 pmie -b -h m1dhcp1.mydomain.com -l /var/log/pcp/pmie/m1dhcp1.mydomain.com/pmie.log -c config.mydomain root 32178 1 0 May29 ? 00:00:00 pmie -b -h m1dhcp2.mydomain.com -l /var/log/pcp/pmie/m1dhcp2.mydomain.com/pmie.log -c config.mydomain root 32446 1 0 May29 ? 00:00:00 pmie -b -h m2dhcp1.mydomain.com -l /var/log/pcp/pmie/m2dhcp1.mydomain.com/pmie.log -c config.mydomain root 32727 1 0 May29 ? 00:00:00 pmie -b -h m2dhcp2.mydomain.com -l /var/log/pcp/pmie/m2dhcp2.mydomain.com/pmie.log -c config.mydomain root 574 1 0 May29 ? 00:00:00 pmie -b -h http1.mydomain.com -l /var/log/pcp/pmie/http1.mydomain.com/pmie.log -c config.mydomain root 886 1 0 May29 ? 00:00:00 pmie -b -h ns1.mydomain.com -l /var/log/pcp/pmie/ns1.mydomain.com/pmie.log -c config.mydomain root 1208 1 0 May29 ? 00:00:00 pmie -b -h ns2.mydomain.com -l /var/log/pcp/pmie/ns2.mydomain.com/pmie.log -c config.mydomain root 1539 1 0 May29 ? 00:00:00 pmie -b -h ldapm1.mydomain.com -l /var/log/pcp/pmie/ldapm1.mydomain.com/pmie.log -c config.mydomain root 1879 1 0 May29 ? 00:00:00 pmie -b -h ldapm2.mydomain.com -l /var/log/pcp/pmie/ldapm2.mydomain.com/pmie.log -c config.mydomain root 2233 1 0 May29 ? 00:00:00 pmie -b -h tvservices.mydomain.com -l /var/log/pcp/pmie/tvservices.mydomain.com/pmie.log -c config.mydomain root 2604 1 0 May29 ? 00:00:00 pmie -b -h prov1.mydomain.com -l /var/log/pcp/pmie/prov1.mydomain.com/pmie.log -c config.mydomain root 3007 1 0 May29 ? 00:00:00 pmie -b -h ztc1.mydomain.com -l /var/log/pcp/pmie/ztc1.mydomain.com/pmie.log -c config.mydomain root 3395 1 0 May29 ? 00:00:00 pmie -b -h ztc2.mydomain.com -l /var/log/pcp/pmie/ztc2.mydomain.com/pmie.log -c config.mydomain root 3795 1 0 May29 ? 00:00:00 pmie -b -h emergency.mydomain.com -l /var/log/pcp/pmie/emergency.mydomain.com/pmie.log -c config.mydomain root 4207 1 0 May29 ? 00:00:00 pmie -b -h hermes.mydomain.com -l /var/log/pcp/pmie/hermes.mydomain.com/pmie.log -c config.mydomain root 4634 1 0 May29 ? 00:00:00 pmie -b -h log01.mydomain.com -l /var/log/pcp/pmie/log01.mydomain.com/pmie.log -c config.mydomain root 5077 1 0 May29 ? 00:00:01 pmie -b -h atmail1.mydomain.com -l /var/log/pcp/pmie/atmail1.mydomain.com/pmie.log -c config.mydomain root 5525 1 0 May29 ? 00:00:02 pmie -b -h atmail2.mydomain.com -l /var/log/pcp/pmie/atmail2.mydomain.com/pmie.log -c config.mydomain root 5985 1 0 May29 ? 00:00:02 pmie -b -h smtp1.mydomain.com -l /var/log/pcp/pmie/smtp1.mydomain.com/pmie.log -c config.mydomain root 6457 1 0 May29 ? 00:00:02 pmie -b -h smtp2.mydomain.com -l /var/log/pcp/pmie/smtp2.mydomain.com/pmie.log -c config.mydomain root 6945 1 0 May29 ? 00:00:03 pmie -b -h maildb2.mydomain.com -l /var/log/pcp/pmie/maildb2.mydomain.com/pmie.log -c config.mydomain root 7444 1 0 May29 ? 00:00:00 pmie -b -h mw1.mydomain.com -l /var/log/pcp/pmie/mw1.mydomain.com/pmie.log -c config.mydomain root 7953 1 0 May29 ? 00:00:00 pmie -b -h asavdb1.mydomain.com -l /var/log/pcp/pmie/asavdb1.mydomain.com/pmie.log -c config.mydomain root 8474 1 0 May29 ? 00:00:00 pmie -b -h asav1.mydomain.com -l /var/log/pcp/pmie/asav1.mydomain.com/pmie.log -c config.mydomain root 9007 1 0 May29 ? 00:00:27 pmie -b -h asav2.mydomain.com -l /var/log/pcp/pmie/asav2.mydomain.com/pmie.log -c config.mydomain root 9557 1 0 May29 ? 00:00:00 pmie -b -h asav3.mydomain.com -l /var/log/pcp/pmie/asav3.mydomain.com/pmie.log -c config.mydomain root 10114 1 0 May29 ? 00:00:28 pmie -b -h asav4.mydomain.com -l /var/log/pcp/pmie/asav4.mydomain.com/pmie.log -c config.mydomain root 10682 1 0 May29 ? 00:00:00 pmie -b -h asav5.mydomain.com -l /var/log/pcp/pmie/asav5.mydomain.com/pmie.log -c config.mydomain root 11263 1 0 May29 ? 00:00:00 pmie -b -h asav6.mydomain.com -l /var/log/pcp/pmie/asav6.mydomain.com/pmie.log -c config.mydomain root 11859 1 0 May29 ? 00:00:00 pmie -b -h mobileprov.mydomain.com -l /var/log/pcp/pmie/mobileprov.mydomain.com/pmie.log -c config.mydomain root 12492 1 0 May29 ? 00:00:00 pmie -b -h wiki.mydomain.com -l /var/log/pcp/pmie/wiki.mydomain.com/pmie.log -c config.mydomain root 17487 1 0 May29 ? 00:00:00 pmie -b -h dhcp2voip.mydomain.com -l /var/log/pcp/pmie/dhcp2voip.mydomain.com/pmie.log -c config.mydomain root 18798 1 0 May29 ? 00:00:00 pmie -b -h dhcp1voip.mydomain.com -l /var/log/pcp/pmie/dhcp1voip.mydomain.com/pmie.log -c config.mydomain root 19444 1 0 May29 ? 00:00:00 pmie -b -h dhcp1tv.mydomain.com -l /var/log/pcp/pmie/dhcp1tv.mydomain.com/pmie.log -c config.mydomain root 23375 1 0 May29 ? 00:00:00 pmie -b -h ns2.mydomain.com -l /var/log/pcp/pmie/ns2.mydomain.com/pmie.log -c config.mydomain root 480 1 0 May29 ? 00:00:00 pmie -b -h mobileprov.mydomain.com -l /var/log/pcp/pmie/mobileprov.mydomain.com/pmie.log -c config.mydomain root 567 1 0 May29 ? 00:00:00 pmie -b -h m1dhcp2.mydomain.com -l /var/log/pcp/pmie/m1dhcp2.mydomain.com/pmie.log -c config.mydomain root 3722 1 0 May29 ? 00:00:00 pmie -b -h tvservices.mydomain.com -l /var/log/pcp/pmie/tvservices.mydomain.com/pmie.log -c config.mydomain root 4424 1 0 May29 ? 00:00:00 pmie -b -h prov1.mydomain.com -l /var/log/pcp/pmie/prov1.mydomain.com/pmie.log -c config.mydomain root 5145 1 0 May29 ? 00:00:00 pmie -b -h ztc1.mydomain.com -l /var/log/pcp/pmie/ztc1.mydomain.com/pmie.log -c config.mydomain root 6838 1 0 May29 ? 00:00:00 pmie -b -h hermes.mydomain.com -l /var/log/pcp/pmie/hermes.mydomain.com/pmie.log -c config.mydomain root 19073 1 0 May29 ? 00:00:00 pmie -b -h ztc2.mydomain.com -l /var/log/pcp/pmie/ztc2.mydomain.com/pmie.log -c config.mydomain --------------------------------------------------------- The duplicates are on dhcp1tv, dhcp1voip, dhcp2voip, hermes, m1dhcp2, mobileprov, ns2, prov1, tvservices, ztc1 and ztc2. When I run a /etc/init.d/pmie stop, these 11 will not stop and I have to kill them manually. After I start pmie from the initscript again, there's only one instance for each host, so I'm pretty confident it's the pmie_check that's mistakingly spawning these. -jf From nscott@aconex.com Wed May 30 22:31:54 2007 Received: with ECARTIS (v1.0.0; list pcp); Wed, 30 May 2007 22:32:00 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l4V5VpWt002913 for ; Wed, 30 May 2007 22:31:53 -0700 Received: from edge (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 788CC92C6C6; Thu, 31 May 2007 15:31:49 +1000 (EST) Subject: Re: pmie spawning more than 1 instance per host From: Nathan Scott Reply-To: nscott@aconex.com To: Jan-Frode Myklebust Cc: pcp@oss.sgi.com In-Reply-To: <20070530082218.GA6332@lc4eb6380248654.ibm.com> References: <1180484426.6273.748.camel@edge> <20070530082218.GA6332@lc4eb6380248654.ibm.com> Content-Type: multipart/mixed; boundary="=-IjpXYaVQm9Bkbn9mTlb4" Organization: Aconex Date: Thu, 31 May 2007 15:38:31 +1000 Message-Id: <1180589911.6273.770.camel@edge> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 X-archive-position: 1268 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp --=-IjpXYaVQm9Bkbn9mTlb4 Content-Type: text/plain Content-Transfer-Encoding: 7bit On Wed, 2007-05-30 at 10:22 +0200, Jan-Frode Myklebust wrote: > > > The duplicates are on dhcp1tv, dhcp1voip, dhcp2voip, hermes, m1dhcp2, > mobileprov, ns2, prov1, tvservices, ztc1 and ztc2. When I run a > /etc/init.d/pmie stop, these 11 will not stop and I have to kill them > manually. After I start pmie from the initscript again, there's only > one instance for each host, so I'm pretty confident it's the > pmie_check that's mistakingly spawning these. The /etc/init.d/pmie start script actually calls pmie_check to do the work of stopping and starting the pmies, so I agree its very likely the problem lies in pmie_check. >From reviewing the pmie_check code, a few things stand out. Firstly, this script hasn't been updated to use the platform independent _get_pids_by_name like the pmlogger_check script (and in fact, like the pmie start script has too). Not does it have the PWD fix from SGI PV #595416 that pmlogger_check does, whatever that bug was (I dunno, its s3krit SGI stuff :). I've switched the script over to have these now, and also added the additional "very verbose" (-V -V) diagnostics that the pmlogger_check script has - could you try out the attached script, in place of your current /usr/share/pcp/bin/pmie_check? (Note that you can call this by hand, at any time, to stop/start your pmie instances - and it also has a "show me" mode (-S) that wont stop/start any, but will tell you if anything would have been changed and what commands would have been run). If you still see the problem with this script, can you capture the ps -ef (ps -efw, ideally, cos thats what _get_pids_by_name does) output, and also the contents of /var/tmp/pmie (if you could make a tarball, that'd be great). And any additional diagnostics (from using "-V -V" options) that give hints as to why the pmie processes were started or not stopped. cheers. -- Nathan --=-IjpXYaVQm9Bkbn9mTlb4 Content-Disposition: attachment; filename=pmie_check.sh Content-Type: application/x-shellscript; name=pmie_check.sh Content-Transfer-Encoding: 7bit #! /bin/sh #Tag 0x00010D13 # # Copyright (c) 1998-2000,2003 Silicon Graphics, Inc. All Rights Reserved. # # This program is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by the # Free Software Foundation; either version 2 of the License, or (at your # option) any later version. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with this program; if not, write to the Free Software Foundation, Inc., # 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # # Contact information: Silicon Graphics, Inc., 1500 Crittenden Lane, # Mountain View, CA 94043, USA, or: http://www.sgi.com # # $Id: pmie_check.sh,v 1.9 2005/01/19 00:04:20 kenmcd Exp $ # # Administrative script to check pmie processes are alive, and restart # them as required. # # Get standard environment . /etc/pcp.env PMIE=pmie # added to handle problem when /var/log/pcp is a symlink, as first # reported by Micah_Altman@harvard.edu in Nov 2001 # _unsymlink_path() { [ -z "$1" ] && return __d=`dirname $1` __real_d=`cd $__d 2>/dev/null && $PWDCMND` if [ -z "$__real_d" ] then echo $1 else echo $__real_d/`basename $1` fi } have_pmieconf=false if which pmieconf >/dev/null 2>&1 then have_pmieconf=true fi # error messages should go to stderr, not the GUI notifiers # unset PCP_STDERR # constant setup # tmp=/tmp/$$ status=0 echo >$tmp.lock trap "rm -f \`[ -f $tmp.lock ] && cat $tmp.lock\` $tmp.*; exit \$status" 0 1 2 3 15 prog=`basename $0` # control file for pmie administration ... edit the entries in this # file to reflect your local configuration # CONTROL=$PCP_VAR_DIR/config/pmie/control # determine real name for localhost LOCALHOSTNAME=`hostname | sed -e 's/\..*//'` [ -z "$LOCALHOSTNAME" ] && LOCALHOSTNAME=localhost # determine path for pwd command to override shell built-in # (see BugWorks ID #595416). PWDCMND=`which pwd 2>/dev/null | $PCP_AWK_PROG ' BEGIN { i = 0 } / not in / { i = 1 } / aliased to / { i = 1 } { if ( i == 0 ) print } '` if [ -z "$PWDCMND" ] then # Looks like we have no choice here... # force it to a known IRIX location PWDCMND=/bin/pwd fi # determine whether SGI Embedded Support Partner events need to be used CONFARGS="-F" if which esplogger >/dev/null 2>&1 then CONFARGS='m global syslog_prefix $esp_prefix$' fi # option parsing # SHOWME=false MV=mv RM=rm CP=cp KILL=kill TERSE=false VERBOSE=false VERY_VERBOSE=false START_PMIE=true usage="Usage: $prog [-NsTV] [-c control]" while getopts c:NsTV? c do case $c in c) CONTROL="$OPTARG" ;; N) SHOWME=true MV="echo + mv" RM="echo + rm" CP="echo + cp" KILL="echo + kill" ;; s) START_PMIE=false ;; T) TERSE=true ;; V) if $VERBOSE then VERY_VERBOSE=true else VERBOSE=true fi ;; ?) echo "$usage" status=1 exit ;; esac done shift `expr $OPTIND - 1` if [ $# -ne 0 ] then echo "$usage" status=1 exit fi _error() { echo "$prog: [$CONTROL:$line]" echo "Error: $1" echo "... automated performance reasoning for host \"$host\" unchanged" touch $tmp.err } _warning() { echo "$prog [$CONTROL:$line]" echo "Warning: $1" } _message() { case $1 in 'restart') $PCP_ECHO_PROG $PCP_ECHO_N "Restarting pmie for host \"$host\" ..." ;; esac } _lock() { # demand mutual exclusion # fail=true rm -f $tmp.stamp for try in 1 2 3 4 do if pmlock -v $logfile.lock >$tmp.out then echo $logfile.lock >$tmp.lock fail=false break else if [ ! -f $tmp.stamp ] then touch -t `pmdate -30M %Y%m%d%H%M` $tmp.stamp fi if [ -n "`find $logfile.lock ! -newer $tmp.stamp -print 2>/dev/null`" ] then _warning "removing lock file older than 30 minutes" ls -l $logfile.lock rm -f $logfile.lock fi fi sleep 5 done if $fail then # failed to gain mutex lock # if [ -f $logfile.lock ] then _warning "is another PCP cron job running concurrently?" ls -l $logfile.lock else echo "$prog: `cat $tmp.out`" fi _warning "failed to acquire exclusive lock ($logfile.lock) ..." continue fi } _unlock() { rm -f $logfile.lock echo >$tmp.lock } _check_logfile() { if [ ! -f $logfile ] then echo "$prog: Error: cannot find pmie output file at \"$logfile\"" if $TERSE then : else logdir=`dirname $logfile` echo "Directory (`cd $logdir; $PWDCMND`) contents:" LC_TIME=POSIX ls -la $logdir fi else echo "Contents of pmie output file \"$logfile\" ..." cat $logfile fi } _check_pmie_version() { # the -C option was introduced at the same time as the $PCP_TMP_DIR/pmie # stats file support (required for pmie_check), so if this produces a # non-zero exit status, bail out # if $PMIE -C /dev/null >/dev/null 2>&1 then : else binary=`which $PMIE` echo "$prog: Error: wrong version of $binary installed" cat - <$tmp.subsys if [ -s $tmp.subsys ] then echo "Currently $binary is installed from these subsystem(s):" echo versions `cat $tmp.subsys` $tmp.out 2>&1 then if grep "No such file or directory" $tmp.out >/dev/null then : else sleep 5 $VERBOSE && echo " done" return 0 fi fi _plist=`_get_pids_by_name pmie` _found=false for _p in `echo $_plist` do [ $_p -eq $1 ] && _found=true done if $_found then # process still here, just hasn't created its status file # yet, try again : else $VERBOSE || _message restart echo " process exited!" if $TERSE then : else echo "$prog: Error: failed to restart pmie" echo "Current pmie processes:" ps $PCP_PS_ALL_FLAGS | tee $tmp.tmp | sed -n -e 1p for _p in `echo $_plist` do sed -n -e "/^[ ]*[^ ]* [ ]*$_p /p" < $tmp.tmp done echo fi _check_logfile return 1 fi fi sleep 5 i=`expr $i + 5` done $VERBOSE || _message restart echo " timed out waiting!" if $TERSE then : else sed -e 's/^/ /' $tmp.out fi _check_logfile return 1 } if $START_PMIE then # ensure we have a pmie binary which supports the features we need # _check_pmie_version else # if pmie has never been started, there's no work to do to stop it # [ ! -d $PCP_TMP_DIR/pmie ] && exit pmpost "stop pmie from $prog" fi if [ ! -f $CONTROL ] then echo "$prog: Error: cannot find control file ($CONTROL)" status=1 exit fi # 1.0 is the first release, and the version is set in the control file # with a $version=x.y line # version=1.0 eval `grep '^version=' $CONTROL | sort -rn` if [ $version != "1.0" ] then _error "unsupported version (got $version, expected 1.0)" status=1 exit fi echo >$tmp.dir rm -f $tmp.err $tmp.pmies line=0 cat $CONTROL \ | sed -e "s/LOCALHOSTNAME/$LOCALHOSTNAME/g" \ -e "s;PCP_LOG_DIR;$PCP_LOG_DIR;g" \ | while read host socks logfile args do logfile=`_unsymlink_path $logfile` line=`expr $line + 1` case "$host" in \#*|'') # comment or empty continue ;; \$*) # in-line variable assignment $SHOWME && echo "# $host $socks $logfile $args" cmd=`echo "$host $socks $logfile $args" \ | sed -n \ -e "/='/s/\(='[^']*'\).*/\1/" \ -e '/="/s/\(="[^"]*"\).*/\1/' \ -e '/=[^"'"'"']/s/[;&<>|].*$//' \ -e '/^\\$[A-Za-z][A-Za-z0-9_]*=/{ s/^\\$// s/^\([A-Za-z][A-Za-z0-9_]*\)=/export \1; \1=/p }'` if [ -z "$cmd" ] then # in-line command, not a variable assignment _warning "in-line command is not a variable assignment, line ignored" else case "$cmd" in 'export PATH;'*) _warning "cannot change \$PATH, line ignored" ;; 'export IFS;'*) _warning "cannot change \$IFS, line ignored" ;; *) $SHOWME && echo "+ $cmd" eval $cmd ;; esac fi continue ;; esac if [ -z "$socks" -o -z "$logfile" -o -z "$args" ] then _error "insufficient fields in control file record" continue fi [ $VERY_VERBOSE = "true" ] && echo "Check pmie -h $host -l $logfile ..." # make sure output directory exists # dir=`dirname $logfile` if [ ! -d $dir ] then mkdir -p $dir >$tmp.err 2>&1 if [ ! -d $dir ] then cat $tmp.err _error "cannot create directory ($dir) for pmie log file" fi fi [ ! -d $dir ] && continue cd $dir dir=`$PWDCMND` $SHOWME && echo "+ cd $dir" if [ ! -w $dir ] then _warning "no write access in $dir, skip lock file processing" ls -ld $dir else _lock fi # match $logfile and $fqdn from control file to running pmies pid="" fqdn=`pmhostname $host` for file in $PCP_TMP_DIR/pmie/[0-9]* do [ "$file" = "$PCP_TMP_DIR/pmie/[0-9]*" ] && continue $VERY_VERBOSE && $PCP_ECHO_PROG $PCP_ECHO_N "... try $file: ""$PCP_ECHO_C" p_id=`echo $file | sed -e 's,.*/,,'` p_logfile="" p_pmcd_host="" # throw away stderr in case $file has been removed by now eval `tr '\0' '\012' < $file 2>/dev/null | sed -e '/^$/d' | sed -e 3q \ | $PCP_AWK_PROG ' NR == 2 { printf "p_logfile=\"%s\"\n", $0; next } NR == 3 { printf "p_pmcd_host=\"%s\"\n", $0; next } { next }'` p_logfile=`_unsymlink_path $p_logfile` if [ "$p_logfile" != $logfile ] then $VERY_VERBOSE && echo "different logfile, skip" elif [ "$p_pmcd_host" != "$fqdn" ] then $VERY_VERBOSE && echo "different host, skip" elif _get_pids_by_name pmie | grep "^$p_id\$" >/dev/null then $VERY_VERBOSE && echo "pmie process $p_id identified, OK" pid=$p_id break else $VERY_VERBOSE && echo "pmie process $p_id not running, skip" fi done if [ -z "$pid" -a $START_PMIE = true ] then configfile=`echo $args | sed -n -e 's/^/ /' -e 's/[ ][ ]*/ /g' -e 's/-c /-c/' -e 's/.* -c\([^ ]*\).*/\1/p'` if [ ! -z "$configfile" ] then # if this is a relative path and not relative to cwd, # substitute in the default pmie search location. # if [ ! -f "$configfile" -a "`basename $configfile`" = "$configfile" ] then configfile="$PCP_VAR_DIR/config/pmie/$configfile" fi if [ -f $configfile ] then # look for "magic" string at start of file # if sed 1q $configfile | grep '^#pmieconf-rules [0-9]' >/dev/null then # pmieconf file, see if re-generation is needed # Note that pmieconf is in the pcp-pro product (not open source) # cp $configfile $tmp.pmie if $have_pmieconf then if pmieconf -f $tmp.pmie $CONFARGS >$tmp.diag 2>&1 then grep -v "generated by pmieconf" $configfile >$tmp.old grep -v "generated by pmieconf" $tmp.pmie >$tmp.new if diff $tmp.old $tmp.new >/dev/null then : else if [ -w $configfile ] then $VERBOSE && echo "Reconfigured: \"$configfile\" (pmieconf)" eval $CP $tmp.pmie $configfile else _warning "no write access to pmieconf file \"$configfile\", skip reconfiguration" ls -l $configfile fi fi else _warning "pmieconf failed to reconfigure \"$configfile\"" cat "s;$tmp.pmie;$configfile;g" $tmp.diag echo "=== start pmieconf file ===" cat $tmp.pmie echo "=== end pmieconf file ===" fi fi fi else # file does not exist, generate it, if possible # Note that pmieconf is in the pcp-pro product (not open source) # if $have_pmieconf then if pmieconf -f $configfile $CONFARGS >$tmp.diag 2>&1 then : else _warning "pmieconf failed to generate \"$configfile\"" cat $tmp.diag echo "=== start pmieconf file ===" cat $configfile echo "=== end pmieconf file ===" fi fi fi fi args="-h $host -l $logfile $args" $VERBOSE && _message restart sock_me='' if [ "$socks" = y ] then # only check for pmsocks if it's specified in the control file have_pmsocks=false if which pmsocks >/dev/null 2>&1 then # check if pmsocks has been set up correctly if pmsocks ls >/dev/null 2>&1 then have_pmsocks=true fi fi if $have_pmsocks then sock_me="pmsocks " else echo "$prog: Warning: no pmsocks available, would run without" sock_me="" fi fi [ -f $logfile ] && eval $MV -f $logfile $logfile.prior if $SHOWME then $VERBOSE && echo echo "+ ${sock_me}$PMIE -b $args" _unlock continue else # since this is launched as a sort of daemon, any output should # go on pmie's stderr, i.e. $logfile ... use -b for this # $VERY_VERBOSE && ( echo; $PCP_ECHO_PROG $PCP_ECHO_N "+ ${sock_me}$PMIE -b $args""$PCP_ECHO_C"; echo "..." ) pmpost "start pmie from $prog for host $host" ${sock_me}$PMIE -b $args & pid=$! fi # wait for pmie to get started, and check on its health _check_pmie $pid elif [ ! -z "$pid" -a $START_PMIE = false ] then # Send pmie a SIGTERM, which is noted as a pending shutdown. # Add pid to list of pmies sent SIGTERM - may need SIGKILL later. # $VERY_VERBOSE && echo "+ $KILL -TERM $pid" eval $KILL -TERM $pid $PCP_ECHO_PROG $PCP_ECHO_N "$pid ""$PCP_ECHO_C" >> $tmp.pmies fi _unlock done # check all the SIGTERM'd pmies really died - if not, use a bigger hammer. # if $SHOWME then : elif [ $START_PMIE = false -a -s $tmp.pmies ] then pmielist=`cat $tmp.pmies` if ps -p "$pmielist" >/dev/null 2>&1 then $VERY_VERBOSE && ( echo; $PCP_ECHO_PROG $PCP_ECHO_N "+ $KILL -KILL `cat $tmp.pmies` ...""$PCP_ECHO_C" ) eval $KILL -KILL $pmielist >/dev/null 2>&1 sleep 3 # give them a chance to go if ps -f -p "$pmielist" >$tmp.alive 2>&1 then echo "$prog: Error: pmie process(es) will not die" cat $tmp.alive status=1 fi fi fi [ -f $tmp.err ] && status=1 exit --=-IjpXYaVQm9Bkbn9mTlb4--