From nscott@aconex.com Sun Jul 1 22:48:13 2007 Received: with ECARTIS (v1.0.0; list pcp); Sun, 01 Jul 2007 22:48:22 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l625mAtL007354 for ; Sun, 1 Jul 2007 22:48:12 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id C4DA592C3C8; Mon, 2 Jul 2007 15:48:10 +1000 (EST) Subject: Re: Review: PCP & pmlogger take too long to start From: Nathan Scott Reply-To: nscott@aconex.com To: Michael Newton Cc: pcp@oss.sgi.com In-Reply-To: References: <1182996127.15488.102.camel@edge.yarra.acx> Content-Type: multipart/mixed; boundary="=-eMwRGrfpgKwjtxy6S0KQ" Organization: Aconex Date: Mon, 02 Jul 2007 15:47:18 +1000 Message-Id: <1183355238.15488.217.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 X-archive-position: 1287 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp --=-eMwRGrfpgKwjtxy6S0KQ Content-Type: text/plain Content-Transfer-Encoding: 7bit Hi Michael, More complete review follows (thanks for implementing pmsleep btw)... On Fri, 2007-06-29 at 18:11 +1000, Michael Newton wrote: > > =========================================================================== > mgmt/pcp/man/man1/pmsleep.1 > =========================================================================== > > --- a/mgmt/pcp/man/man1/pmsleep.1 2006-06-17 00:58:24.000000000 > +1000 > +++ b/mgmt/pcp/man/man1/pmsleep.1 2007-06-29 15:48:28.024750676 > +1000 > @@ -0,0 +1,41 @@ > +'\"macro stdmacro > +.\" > +.\" Copyright (c) 2007 Silicon Graphics, Inc. All Rights Reserved. > +.\" > +.\" $Id$ > +.ie \(.g \{\ > +.\" ... groff (hack for khelpcenter, man2html, etc.) > +.TH PMSLEEP 1 "SGI" "Performance Co-Pilot" > +\} > +.el \{\ > +.if \nX=0 .ds x} PMSLEEP 1 "SGI" "Performance Co-Pilot" > +.if \nX=1 .ds x} PMSLEEP 1 "Performance Co-Pilot" > +.if \nX=2 .ds x} PMSLEEP 1 "" "\&" > +.if \nX=3 .ds x} PMSLEEP "" "" "\&" > +.TH \*(x} > +.rr X > +\} > +.SH NAME > +\f3pmsleep\f1 \- portable subsecond-capable sleep > +.\" literals use .B or \f3 > +.\" arguments use .I or \f2 > +.SH SYNOPSIS > +.B $PCP_BINADM_DIR/pmsleep > +.I interval > +.SH DESCRIPTION > +.B pmsleep > +sleeps for > +.I interval. > +The > +.I interval > +argument follows the syntax described in > +.BR PCPIntro (1) > +for > +.B \-t, > +and in the simplest form may be an unsigned integer > +or floating point constant > +(the implied units in this case are seconds). > + > +.PP > +The exit status is 0 for success, or 1 for a malformed command line. > +If the underlying nanosleep fails, an errno is returned. The SEE ALSO section could probably reference sleep(1) and nanosleep(2). > =========================================================================== > mgmt/pcp/src/pmcd/rc_pcp > =========================================================================== > > --- a/mgmt/pcp/src/pmcd/rc_pcp 2007-06-29 18:09:45.000000000 +1000 > +++ b/mgmt/pcp/src/pmcd/rc_pcp 2007-06-29 16:07:49.625951131 +1000 > @@ -100,6 +100,8 @@ > ;; > esac > > +SLEEPCMND="$PCP_BINADM_DIR/pmsleep 0.1" > + This variable just seems to be obfuscating the logic, I'd remove it. PCP_BINADM_DIR is set in the PATH in /etc/pcp.env, so full path isn't needed there. Since the 0.1 argument is important in several of the uses (as there is control flow assuming tenths of a second in places), the argument should be expanded close to the logic thats using it too. > _pmcd_logfile() > { > default=$RUNDIR/pmcd.log > @@ -383,16 +385,25 @@ > fi > $ECHO $PCP_ECHO_N "Waiting for PMCD to > terminate ...""$PCP_ECHO_C" > gone=0 > - for i in 1 2 3 4 5 6 > + i=0 > + j=0 > + while : > do > - sleep 3 > _get_pids_by_name pmcd >$tmp.tmp > if [ ! -s $tmp.tmp ] > then > gone=1 > break > fi > - $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C" > + i=`expr $i + 1` > + if [ $i -ge 10 ] > + then > + i=0 > + [ $j -ge $delay ] && break > + j=`expr $j + 1` > + $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C" > + fi > + $SLEEPCMND > done > if [ $gone != 1 ] # It just WON'T DIE, give up. Hmmm, thats not right. Firstly, $delay doesn't exist in this script (it looks like this script snippet has been incorrectly duplicated), so you'll get a shell-syntax-error if that branch (with $j) is taken. Secondly, that branch is b0rken in that it will only ever print one '.' character - its meant to print one dot every iteration (or every second now, I guess). You'll want to use an expr modulo (%) there. Thirdly, the while logic is strange - you could just use the normal loop control mechanism (rather than the "while :" and explicit break statements). (I've attached an alternate patch that implements these things). > =========================================================================== > mgmt/pcp/src/pmie/pmie_check.sh > =========================================================================== > =========================================================================== > mgmt/pcp/src/pmlogctl/pmlogger_check.sh > =========================================================================== These two scripts have the same sorts of problems, fixed in attached patch. > > =========================================================================== > mgmt/pcp/src/pmsleep/pmsleep.c > =========================================================================== > ... > +int > +main(int argc, char **argv) > +{ > + struct timespec rqt; > + struct timeval delta; > + int r = 0; > + char *msg; > + > + if (argc == 2) { > + if (pmParseInterval(argv[1], &delta, &msg) < 0) { > + fputs(msg, stderr); > + free(msg); > + } else { > + rqt.tv_sec = delta.tv_sec; > + rqt.tv_nsec = delta.tv_usec * 1000; > + if (0 != nanosleep(&rqt, NULL)) > + r = errno; > + > + exit(r); > + } > + } > + fprintf(stderr, "Usage: pmsleep [-v] interval\n"); There's no -v option. The 'r' variable isn't really necessary - see my attached patch that makes this slightly simpler. Was it intentional to print a usage message when a pmParseInterval error occurs? Seems a bit odd - *shrug*, not a big deal obviously. BTW, good work on finding the DSO agent speedup - that will help across the board (definately makes pmcd stop alot quicker for me). cheers. ps: the attached patch is an incremental patch (on top of yours), applied to my current git tree - so, you may see some fuzzy patch matching due to other pmie start script patches in particular. -- Nathan --=-eMwRGrfpgKwjtxy6S0KQ Content-Disposition: attachment; filename=update_faster_startup Content-Type: text/x-patch; name=update_faster_startup; charset=utf-8 Content-Transfer-Encoding: 7bit Index: pcp/man/man1/pmsleep.1 =================================================================== --- pcp.orig/man/man1/pmsleep.1 2007-07-02 15:33:26.988523000 +1000 +++ pcp/man/man1/pmsleep.1 2007-07-02 15:33:27.180523000 +1000 @@ -35,7 +35,11 @@ for and in the simplest form may be an unsigned integer or floating point constant (the implied units in this case are seconds). - -.PP +.SH DIAGNOSTICS The exit status is 0 for success, or 1 for a malformed command line. -If the underlying nanosleep fails, an errno is returned. +If the underlying +.B nanosleep (2) +system call fails, an errno is returned. +.SH SEE ALSO +.BR sleep (1), +.BR nanosleep (3). Index: pcp/src/pmcd/rc_pcp =================================================================== --- pcp.orig/src/pmcd/rc_pcp 2007-07-02 15:33:27.036523000 +1000 +++ pcp/src/pmcd/rc_pcp 2007-07-02 15:38:31.376523000 +1000 @@ -117,8 +117,6 @@ in ;; esac -SLEEPCMND="$PCP_BINADM_DIR/pmsleep 0.1" - _pmcd_logfile() { default=$RUNDIR/pmcd.log @@ -403,28 +401,16 @@ _shutdown() $PCP_KILLALL_PROG -TERM pmcd > /dev/null 2>&1 fi $ECHO $PCP_ECHO_N "Waiting for PMCD to terminate ...""$PCP_ECHO_C" - gone=0 - i=0 - j=0 - while : + delay=200 # tenths of a second + while [ $delay -gt 0 ] do _get_pids_by_name pmcd >$tmp.tmp - if [ ! -s $tmp.tmp ] - then - gone=1 - break - fi - i=`expr $i + 1` - if [ $i -ge 10 ] - then - i=0 - [ $j -ge $delay ] && break - j=`expr $j + 1` - $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C" - fi - $SLEEPCMND + [ ! -s $tmp.tmp ] && break + pmsleep 0.1 + delay=`expr $delay - 1` + [ `expr $delay % 10` -ne 0 ] || $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C" done - if [ $gone != 1 ] # It just WON'T DIE, give up. + if [ $delay -eq 0 ] # It just WON'T DIE, give up. then echo "Process ..." cat $tmp.tmp Index: pcp/src/pmsleep/pmsleep.c =================================================================== --- pcp.orig/src/pmsleep/pmsleep.c 2007-07-02 15:33:27.168523000 +1000 +++ pcp/src/pmsleep/pmsleep.c 2007-07-02 15:33:27.212523000 +1000 @@ -13,7 +13,6 @@ main(int argc, char **argv) { struct timespec rqt; struct timeval delta; - int r = 0; char *msg; if (argc == 2) { @@ -23,13 +22,11 @@ main(int argc, char **argv) } else { rqt.tv_sec = delta.tv_sec; rqt.tv_nsec = delta.tv_usec * 1000; - if (0 != nanosleep(&rqt, NULL)) - r = errno; - - exit(r); + return (nanosleep(&rqt, NULL) == 0) ? 0 : errno; } + } else { + fprintf(stderr, "Usage: pmsleep \n"); } - fprintf(stderr, "Usage: pmsleep [-v] interval\n"); exit(1); /*NOTREACHED*/ } Index: pcp/src/pmie/pmie_check.sh =================================================================== --- pcp.orig/src/pmie/pmie_check.sh 2007-07-02 15:33:27.072523000 +1000 +++ pcp/src/pmie/pmie_check.sh 2007-07-02 15:33:27.228523000 +1000 @@ -31,8 +31,6 @@ PMIE=pmie -SLEEPCMND="$PCP_BINADM_DIR/pmsleep 0.1" - # added to handle problem when /var/log/pcp is a symlink, as first # reported by Micah_Altman@harvard.edu in Nov 2001 # @@ -177,15 +175,13 @@ _lock() { # demand mutual exclusion # - fail=true rm -f $tmp.stamp - i=0 - while : + delay=200 # tenths of a second + while [ $delay -ne 0 ] do if pmlock -v $logfile.lock >$tmp.out then echo $logfile.lock >$tmp.lock - fail=false break else if [ ! -f $tmp.stamp ] @@ -199,12 +195,11 @@ _lock() rm -f $logfile.lock fi fi - [ $i -ge 200 ] && break #tenths of a sec - $SLEEPCMND - i=`expr $i + 1` + pmsleep 0.1 + delay=`expr $delay - 1` done - if $fail + if [ $delay -eq 0 ] then # failed to gain mutex lock # @@ -311,10 +306,8 @@ _check_pmie() # wait for maximum time of a connection and 20 requests # - delay=`expr $delay + 20 \* $x` - i=0 - j=0 - while : + delay=`expr \( $delay + 20 \* $x \) \* 10` # tenths of a second + while [ $delay -ne 0 ] do if [ -f $logfile ] then @@ -327,7 +320,7 @@ _check_pmie() then : else - $SLEEPCMND + pmsleep 0.1 $VERBOSE && echo " done" return 0 fi @@ -365,15 +358,10 @@ _check_pmie() return 1 fi fi - i=`expr $i + 1` - if [ $i -ge 10 ] - then - i=0 - [ $j -ge $delay ] && break - j=`expr $j + 1` - $VERBOSE && $PCP_ECHO_PROG $PCP_ECHO_N ".""$PCP_ECHO_C" - fi - $SLEEPCMND + pmsleep 0.1 + delay=`expr $delay - 1` + $VERBOSE && [ `expr $delay % 10` -eq 0 ] && \ + $PCP_ECHO_PROG $PCP_ECHO_N ".""$PCP_ECHO_C" done $VERBOSE || _message restart echo " timed out waiting!" @@ -681,14 +669,14 @@ then then $VERY_VERBOSE && ( echo; $PCP_ECHO_PROG $PCP_ECHO_N "+ $KILL -KILL `cat $tmp.pmies` ...""$PCP_ECHO_C" ) eval $KILL -KILL $pmielist >/dev/null 2>&1 - i=0 + delay=30 # tenths of a second while ps -f -p "$pmielist" >$tmp.alive 2>&1 do - if [ $i -lt 30 ] + if [ $delay -gt 0 ] then - $SLEEPCMND - i=`expr $i + 1` - continue; + pmsleep 0.1 + delay=`expr $delay - 1` + continue fi echo "$prog: Error: pmie process(es) will not die" cat $tmp.alive Index: pcp/src/pmlogctl/pmlogger_check.sh =================================================================== --- pcp.orig/src/pmlogctl/pmlogger_check.sh 2007-07-02 15:33:27.088523000 +1000 +++ pcp/src/pmlogctl/pmlogger_check.sh 2007-07-02 15:33:27.244523000 +1000 @@ -68,8 +68,6 @@ then PWDCMND=/bin/pwd fi -SLEEPCMND="$PCP_BINADM_DIR/pmsleep 0.1" - # default location # logfile=pmlogger.log @@ -211,10 +209,8 @@ _check_logger() # wait for maximum time of a connection and 20 requests # - delay=`expr $delay + 20 \* $x` - i=0 - j=0 - while : + delay=`expr \( $delay + 20 \* $x \) \* 10` # tenths of a second + while [ $delay -gt 0 ] do if [ -f $logfile ] then @@ -226,7 +222,7 @@ _check_logger() then : else - $SLEEPCMND + pmsleep 0.1 $VERBOSE && echo " done" return 0 fi @@ -263,15 +259,10 @@ _check_logger() return 1 fi fi - i=`expr $i + 1` - if [ $i -ge 10 ] - then - i=0 - [ $j -ge $delay ] && break - j=`expr $j + 1` - $VERBOSE && $PCP_ECHO_PROG $PCP_ECHO_N ".""$PCP_ECHO_C" - fi - $SLEEPCMND + pmsleep 0.1 + delay=`expr $delay - 1` + $VERBOSE && [ `expr $delay % 10` -eq 0 ] && \ + $PCP_ECHO_PROG $PCP_ECHO_N ".""$PCP_ECHO_C" done $VERBOSE || _message restart echo " timed out waiting!" @@ -403,10 +394,9 @@ s/^\([A-Za-z][A-Za-z0-9_]*\)=/export \1; else # demand mutual exclusion # - fail=true rm -f $tmp.stamp - i=0 - while : + delay=200 # tenths of a second + while [ $delay -gt 0 ] do if pmlock -v lock >$tmp.out then @@ -434,12 +424,11 @@ s/^\([A-Za-z][A-Za-z0-9_]*\)=/export \1; rm -f lock fi fi - [ $i -ge 200 ] && break #tenths of a sec - $SLEEPCMND - i=`expr $i + 1` + pmsleep 0.1 + delay=`expr $delay - 1` done - if $fail + if [ $delay -eq 0 ] then # failed to gain mutex lock # --=-eMwRGrfpgKwjtxy6S0KQ-- From nscott@aconex.com Sun Jul 1 23:03:13 2007 Received: with ECARTIS (v1.0.0; list pcp); Sun, 01 Jul 2007 23:03:17 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l6263BtL011516 for ; Sun, 1 Jul 2007 23:03:12 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 59DAC92C3E8; Mon, 2 Jul 2007 16:03:13 +1000 (EST) Subject: Re: Review: PCP & pmlogger take too long to start From: Nathan Scott Reply-To: nscott@aconex.com To: Michael Newton Cc: pcp@oss.sgi.com In-Reply-To: <1183355238.15488.217.camel@edge.yarra.acx> References: <1182996127.15488.102.camel@edge.yarra.acx> <1183355238.15488.217.camel@edge.yarra.acx> Content-Type: multipart/mixed; boundary="=-bp/LIFMRA/bErdqPmUu0" Organization: Aconex Date: Mon, 02 Jul 2007 16:02:21 +1000 Message-Id: <1183356141.15488.223.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 X-archive-position: 1288 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp --=-bp/LIFMRA/bErdqPmUu0 Content-Type: text/plain Content-Transfer-Encoding: 7bit BTW, one other thing I noticed here in the pmie and pmlogger check scripts - theres a comment near the start of the loop in _check_logger() that doesn't match what the code does. And the code looks wrong - the comment is: # $logfile was previously removed, if it has appeared again # then we know pmlogger has started ... if not just sleep and # try again But what it actually seems to do is sleep when pmlogger has started, just prior to returning from the call (when we know pmlogger has started...). The attached patch fixes this up, and implements what the comment (correctly, I believe) says. [ This is an incremental patch on top of my previous patch. ] Thoughts? cheers. -- Nathan --=-bp/LIFMRA/bErdqPmUu0 Content-Disposition: attachment; filename=fix-startup-extra-sleep Content-Type: text/x-patch; name=fix-startup-extra-sleep; charset=utf-8 Content-Transfer-Encoding: 7bit Index: pcp/src/pmie/pmie_check.sh =================================================================== --- pcp.orig/src/pmie/pmie_check.sh 2007-07-02 15:48:09.952523000 +1000 +++ pcp/src/pmie/pmie_check.sh 2007-07-02 15:48:52.904523000 +1000 @@ -318,9 +318,8 @@ _check_pmie() then if grep "No such file or directory" $tmp.out >/dev/null then - : - else pmsleep 0.1 + else $VERBOSE && echo " done" return 0 fi Index: pcp/src/pmlogctl/pmlogger_check.sh =================================================================== --- pcp.orig/src/pmlogctl/pmlogger_check.sh 2007-07-02 15:48:09.976523000 +1000 +++ pcp/src/pmlogctl/pmlogger_check.sh 2007-07-02 15:49:07.056523000 +1000 @@ -220,9 +220,8 @@ _check_logger() # if echo "connect $1" | pmlc 2>&1 | grep "Unable to connect" >/dev/null then - : - else pmsleep 0.1 + else $VERBOSE && echo " done" return 0 fi --=-bp/LIFMRA/bErdqPmUu0-- From nscott@aconex.com Sun Jul 1 23:34:44 2007 Received: with ECARTIS (v1.0.0; list pcp); Sun, 01 Jul 2007 23:34:49 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l626YgtL023441 for ; Sun, 1 Jul 2007 23:34:43 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 4AABE92C610 for ; Mon, 2 Jul 2007 16:34:44 +1000 (EST) Subject: pcp updates From: Nathan Scott Reply-To: nscott@aconex.com To: pcp@oss.sgi.com Content-Type: text/plain Organization: Aconex Date: Mon, 02 Jul 2007 16:33:52 +1000 Message-Id: <1183358032.15488.230.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1289 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Changes committed to git://oss.sgi.com:8090/nathans/pcp.git man/man1/GNUmakefile | 2 man/man1/pmsleep.1 | 51 ++++ src/GNUmakefile | 2 src/pmcd/rc_pcp | 45 +--- src/pmcd/src/agent.c | 2 src/pmdas/sample/help | 4 src/pmdas/sample/pmns | 4 src/pmdas/sample/src/sample.c | 95 ++++---- src/pmie/pmie_check.sh | 92 ++++---- src/pmie/src/GNUmakefile | 4 src/pmie/src/aggregate.sk | 10 src/pmie/src/andor.c | 448 +++++++++++++++++++++++++++++++++++++++++ src/pmie/src/andor.h | 37 +++ src/pmie/src/binary.sk | 6 src/pmie/src/fun.h | 9 src/pmie/src/merge.sk | 2 src/pmie/src/meta | 8 src/pmie/src/misc.sk | 2 src/pmie/src/unary.sk | 2 src/pmlogctl/pmlogger_check.sh | 64 ++--- src/pmsleep/GNUmakefile | 25 ++ src/pmsleep/pmsleep.c | 44 +++- 22 files changed, 773 insertions(+), 185 deletions(-) commit 98e963beda9238ee72325e0020de5e4654e864aa Author: Nathan Scott Date: Mon Jul 2 16:16:39 2007 +1000 Make source match comment in pmie/pmlogger check scripts wrt a startup delay. commit d4d42cf507ef74d23521d666696c2886d119908c Author: Nathan Scott Date: Mon Jul 2 16:15:37 2007 +1000 Fix/cleanup some areas of earlier startup performance improvements. commit 5f9fadea84ad92a62805a465638c5c73a5a98718 Author: Michael Newton Date: Mon Jul 2 16:13:06 2007 +1000 Performance improvements to pmlogger, pmie and pmcd startup times. commit 8a72679e01fdd3fe5d3a25f0dde6478115aa211e Author: Nathan Scott Date: Mon Jul 2 16:09:33 2007 +1000 Fix a pmie_check typo causing mis-identification of pmie processes. commit 605771bf897aa7fe4c7c79215d901bb6e5960dc6 Author: Ken McDonell Date: Mon Jul 2 12:19:17 2007 +1000 Allow pmie and/or operators to function with some data missing. This change modifies pmie to allow a logical OR expression to evalute to true when only once side of the expression tree can be evaluated, due to host down / instance unavailable / insufficient samples. Same for logical AND expressions, and evaluating to false. In order to test this modification, a new metric has been added into the sample agent - sample.darkness - which shares an instance domain with sample.color, but always returns no values available. From nscott@aconex.com Sun Jul 1 23:40:38 2007 Received: with ECARTIS (v1.0.0; list pcp); Sun, 01 Jul 2007 23:40:43 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l626ebtL025327 for ; Sun, 1 Jul 2007 23:40:38 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 4D79792C3C8; Mon, 2 Jul 2007 16:40:39 +1000 (EST) Subject: Re: pmie spawning more than 1 instance per host From: Nathan Scott Reply-To: nscott@aconex.com To: Jan-Frode Myklebust Cc: pcp@oss.sgi.com In-Reply-To: <20070602204048.GA4067@lc4eb6380248654.ibm.com> References: <1180484426.6273.748.camel@edge> <20070530082218.GA6332@lc4eb6380248654.ibm.com> <1180589911.6273.770.camel@edge> <20070602204048.GA4067@lc4eb6380248654.ibm.com> Content-Type: text/plain Organization: Aconex Date: Mon, 02 Jul 2007 16:39:47 +1000 Message-Id: <1183358387.15488.236.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1290 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Hi Jan-Frode, On Sat, 2007-06-02 at 22:40 +0200, Jan-Frode Myklebust wrote: > On Thu, May 31, 2007 at 03:38:31PM +1000, Nathan Scott wrote: > > > > I've switched the script over to have these now, and also added > > the additional "very verbose" (-V -V) diagnostics that the > > pmlogger_check script has - could you try out the attached > > script, in place of your current /usr/share/pcp/bin/pmie_check? > > I didn't replace /usr/share/pcp/bin/pmie_check, but rather put > your script in /etc/cron.hourly/pmie_check.sh. Unfortunately > it also leaks out new instances for already running pmie's. I found a bug in this earlier version of my patch - you may have better results with the one in my git tree ("nathans" branch of git://oss.sgi.com:8090/nathans/pcp.git). The other thing I came across on our RHEL4 production servers recently is that the tmpwatch(1) program is run daily from cron; and it "cleaned up" some tmp files below /var/tmp for an agent we use here. Since the pmie scripts also live in /var/tmp, its also quite possible that it could eat the temp pmie state files. We worked around this by adding the "-s" switch to the tmpwatch invocation on /var/tmp (in /etc/cron.daily/tmpwatch). cheers. -- Nathan From kimbrr@sgi.com Mon Jul 2 00:03:08 2007 Received: with ECARTIS (v1.0.0; list pcp); Mon, 02 Jul 2007 00:03:12 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l62732tL001874 for ; Mon, 2 Jul 2007 00:03:05 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA25977; Mon, 2 Jul 2007 17:02:58 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l6272ueW8497953; Mon, 2 Jul 2007 17:02:57 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l6272rKc8543237; Mon, 2 Jul 2007 17:02:55 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Mon, 2 Jul 2007 17:02:53 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: Nathan Scott cc: pcp@oss.sgi.com Subject: Re: Review: PCP & pmlogger take too long to start In-Reply-To: <1183356141.15488.223.camel@edge.yarra.acx> Message-ID: References: <1182996127.15488.102.camel@edge.yarra.acx> <1183355238.15488.217.camel@edge.yarra.acx> <1183356141.15488.223.camel@edge.yarra.acx> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1291 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp Nathan Scott >The SEE ALSO section could probably reference sleep(1) and nanosleep(2). yep >> =========================================================================== >> mgmt/pcp/src/pmcd/rc_pcp >> =========================================================================== >> >> --- a/mgmt/pcp/src/pmcd/rc_pcp 2007-06-29 18:09:45.000000000 +1000 >> +++ b/mgmt/pcp/src/pmcd/rc_pcp 2007-06-29 16:07:49.625951131 +1000 >> @@ -100,6 +100,8 @@ >> ;; >> esac >> >> +SLEEPCMND="$PCP_BINADM_DIR/pmsleep 0.1" >> + > >This variable just seems to be obfuscating the logic, I'd remove it. >PCP_BINADM_DIR is set in the PATH in /etc/pcp.env, so full path isn't >needed there. Since the 0.1 argument is important in several of the >uses (as there is control flow assuming tenths of a second in places), >the argument should be expanded close to the logic thats using it too. if you like >> _pmcd_logfile() >> { >> default=$RUNDIR/pmcd.log >> @@ -383,16 +385,25 @@ >> fi >> $ECHO $PCP_ECHO_N "Waiting for PMCD to >> terminate ...""$PCP_ECHO_C" >> gone=0 >> - for i in 1 2 3 4 5 6 >> + i=0 >> + j=0 >> + while : >> do >> - sleep 3 >> _get_pids_by_name pmcd >$tmp.tmp >> if [ ! -s $tmp.tmp ] >> then >> gone=1 >> break >> fi >> - $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C" >> + i=`expr $i + 1` >> + if [ $i -ge 10 ] >> + then >> + i=0 >> + [ $j -ge $delay ] && break >> + j=`expr $j + 1` >> + $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C" >> + fi >> + $SLEEPCMND >> done >> if [ $gone != 1 ] # It just WON'T DIE, give up. > >Hmmm, thats not right. Firstly, $delay doesn't exist in this script >(it looks like this script snippet has been incorrectly duplicated), yes >so you'll get a shell-syntax-error if that branch (with $j) is taken. ok >Secondly, that branch is b0rken in that it will only ever print one >'.' character - its meant to print one dot every iteration (or every >second now, I guess). You'll want to use an expr modulo (%) there. dont understand. $i is incremented every time round. When it gets to 10 * it gets reset to zero * j is incremented * '.' is printed so $i counts tenths of a sec and $j counts whole seconds the structure is more complicated but it avoids the division.. swings & roundabouts? >Thirdly, the while logic is strange - you could just use the normal >loop control mechanism (rather than the "while :" and explicit break >statements). > >(I've attached an alternate patch that implements these things). as i said before, test at the top means that you are not testing the condition which is the actual purpose of the loop (in this case whether pmcd has gone away yet) after the final sleep.. so there wasnt much point in doing it! btw i also aimed to keep the diffs minimal >> =========================================================================== >> mgmt/pcp/src/pmie/pmie_check.sh >> =========================================================================== > >> =========================================================================== >> mgmt/pcp/src/pmlogctl/pmlogger_check.sh >> =========================================================================== > >These two scripts have the same sorts of problems, fixed in attached >patch. > >> >> =========================================================================== >> mgmt/pcp/src/pmsleep/pmsleep.c >> =========================================================================== >> ... >> +int >> +main(int argc, char **argv) >> +{ >> + struct timespec rqt; >> + struct timeval delta; >> + int r = 0; >> + char *msg; >> + >> + if (argc == 2) { >> + if (pmParseInterval(argv[1], &delta, &msg) < 0) { >> + fputs(msg, stderr); >> + free(msg); >> + } else { >> + rqt.tv_sec = delta.tv_sec; >> + rqt.tv_nsec = delta.tv_usec * 1000; >> + if (0 != nanosleep(&rqt, NULL)) >> + r = errno; >> + >> + exit(r); >> + } >> + } >> + fprintf(stderr, "Usage: pmsleep [-v] interval\n"); > >There's no -v option. oops! > The 'r' variable isn't really necessary - >see my attached patch that makes this slightly simpler. very slightly! is there a reason to prefer "return" over "exit"? > Was it intentional to print a usage message when a pmParseInterval error >occurs? Seems a bit odd - *shrug*, not a big deal obviously. yes it was intentional -- doesnt seem odd to me - why wouldnt you print one? Its consistent with the behaviour of say pmval when given a malformed -t value >BTW, one other thing I noticed here in the pmie and pmlogger >check scripts - theres a comment near the start of the loop >in _check_logger() that doesn't match what the code does. >And the code looks wrong - the comment is: > ># $logfile was previously removed, if it has appeared again ># then we know pmlogger has started ... if not just sleep and ># try again > >But what it actually seems to do is sleep when pmlogger has >started, just prior to returning from the call (when we know >pmlogger has started...). of course i didnt originate this code but i think you have misinterpreted it. Possibly the comment ought to be one step higher up the file, eg before if [ -f $logfile ] in pmie_check.sh. There are 2 sleeps within the loop, one of which is done every time, and one which is only done at the end. The comment i believe refers to the one which is done every time, which is lexically the 2nd of the 2. You are looking at the lexically 1st, which is only done at the end. For instance in pmlogger_check, if you have succeeded in making a connection, you know pmlogger is through its initialisation, but now you need to give it a little time to finish cleaning up the connection you have just relinquished, so as to be ready to talk to a real client. Thats my theory anyway! In pmie_check perhaps the theory is there may still be little init to be done after the log file has appeared -- or maybe the idiom was just copied from pmlogger_check -- but anyway now that its only 0.1s i dont think its doing any harm. ps i really need to get this in Dr.Michael("Kimba")Newton kimbrr@sgi.com From nscott@aconex.com Mon Jul 2 16:08:48 2007 Received: with ECARTIS (v1.0.0; list pcp); Mon, 02 Jul 2007 16:08:53 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l62N8ktL028724 for ; Mon, 2 Jul 2007 16:08:48 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id B4A4292C4B1; Tue, 3 Jul 2007 09:08:48 +1000 (EST) Subject: Re: Review: PCP & pmlogger take too long to start From: Nathan Scott Reply-To: nscott@aconex.com To: Michael Newton Cc: pcp@oss.sgi.com In-Reply-To: References: <1182996127.15488.102.camel@edge.yarra.acx> <1183355238.15488.217.camel@edge.yarra.acx> <1183356141.15488.223.camel@edge.yarra.acx> Content-Type: text/plain Organization: Aconex Date: Tue, 03 Jul 2007 09:07:58 +1000 Message-Id: <1183417678.15488.257.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1292 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp On Mon, 2007-07-02 at 17:02 +1000, Michael Newton wrote: > > >Secondly, that branch is b0rken in that it will only ever print one > >'.' character - its meant to print one dot every iteration (or every > >second now, I guess). You'll want to use an expr modulo (%) there. > > dont understand. $i is incremented every time round. When it gets to > 10 > * it gets reset to zero > * j is incremented > * '.' is printed > > so $i counts tenths of a sec and $j counts whole seconds Yep, I thought $delay was the loop step when I wrote that (hadn't seen the other two scripts then that really use it at that stage) - ignore that comment. > > The 'r' variable isn't really necessary - > >see my attached patch that makes this slightly simpler. > > very slightly! is there a reason to prefer "return" over "exit"? :) - yep, its trivial, couldn't help myself. No reason for one over the other here, only case I know of where you'd choose exit is if an atexit handler is needed. > yes it was intentional -- doesnt seem odd to me - why wouldnt you > print one? > Its consistent with the behaviour of say pmval when given a malformed > -t value Fair enough. My thinking was the errno from pmParseInterval might not have anything to do with bad usage (like ENOMEM or something), but I guess in practice it probably always does. > btw i also aimed to keep the diffs minimal If it comes down to simple-and-readable vs minimal-and-odd-looking, I'd pick simple-and-readable every time - these scripts are hairy enough already. > of course i didnt originate this code but i think you have > misinterpreted it. *nod*, yep, I see it now. > you know pmlogger is through its initialisation, but now > you need to give it a little time to finish cleaning up the > connection you have just relinquished, so as to be ready to > talk to a real client. That bit I don't buy - the startup script uses pmlc to connect as this guarantees that pmlogger is started up and ready - there isn't any "little time" after a pmlc connection where pmlogger is not available for another connection. It seems that additional sleep there is redundant - I'll ask Ken though when he's back if he can remember why it was added (any chance you can do some rlog sleuthing, see if it was added in response to a bug?). Really odd that it was a full three second sleep too. The pmie script is based closely on the pmlogger one (was copied at one point, by me) - so, the pmlogger behaviour and history there is much more interesting. cheers. -- Nathan From kimbrr@sgi.com Mon Jul 2 17:41:55 2007 Received: with ECARTIS (v1.0.0; list pcp); Mon, 02 Jul 2007 17:42:01 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l630fqtL022501 for ; Mon, 2 Jul 2007 17:41:54 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA21033; Tue, 3 Jul 2007 10:41:49 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l630fleW9537517; Tue, 3 Jul 2007 10:41:48 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l630fjII9570738; Tue, 3 Jul 2007 10:41:46 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Tue, 3 Jul 2007 10:41:45 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: Nathan Scott cc: pcp@oss.sgi.com Subject: Re: Review: PCP & pmlogger take too long to start In-Reply-To: <1183417678.15488.257.camel@edge.yarra.acx> Message-ID: References: <1182996127.15488.102.camel@edge.yarra.acx> <1183355238.15488.217.camel@edge.yarra.acx> <1183356141.15488.223.camel@edge.yarra.acx> <1183417678.15488.257.camel@edge.yarra.acx> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1293 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp On Tue, 3 Jul 2007, Nathan Scott wrote: > On Mon, 2007-07-02 at 17:02 +1000, Michael Newton wrote: > > you know pmlogger is through its initialisation, but now > > you need to give it a little time to finish cleaning up the > > connection you have just relinquished, so as to be ready to > > talk to a real client. > > That bit I don't buy - the startup script uses pmlc to connect > as this guarantees that pmlogger is started up and ready - there > isn't any "little time" after a pmlc connection where pmlogger is > not available for another connection. It seems that additional > sleep there is redundant - I'll ask Ken though when he's back if > he can remember why it was added (any chance you can do some rlog > sleuthing, see if it was added in response to a bug?). p_rlog only shows me back to 1.22, which still has the pre-exit sleep > Really odd that it was a full three second sleep too. 5, i believe i'll make the agreed changes and retest. Unless something else happens in the meantime, i'll commit it.. this is taking too log.. minor fix-ups can be done later Dr.Michael("Kimba")Newton kimbrr@sgi.com From kimbrr@sgi.com Tue Jul 3 03:19:33 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 03 Jul 2007 03:19:38 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l63AJStL008596 for ; Tue, 3 Jul 2007 03:19:30 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id UAA05289; Tue, 3 Jul 2007 20:19:25 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l63AJOeW9834232; Tue, 3 Jul 2007 20:19:24 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l63AJLI49713521; Tue, 3 Jul 2007 20:19:23 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Tue, 3 Jul 2007 20:19:20 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: Nathan Scott cc: pcp@oss.sgi.com Subject: Re: Review: PCP & pmlogger take too long to start In-Reply-To: <1183417678.15488.257.camel@edge.yarra.acx> Message-ID: References: <1182996127.15488.102.camel@edge.yarra.acx> <1183355238.15488.217.camel@edge.yarra.acx> <1183356141.15488.223.camel@edge.yarra.acx> <1183417678.15488.257.camel@edge.yarra.acx> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1294 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp i decided to repost as ive tried to take on board favouring readability & conciseness over keeping the diffs short =========================================================================== mgmt/pcp/man/man1/GNUmakefile =========================================================================== --- a/mgmt/pcp/man/man1/GNUmakefile 2007-07-03 20:16:14.000000000 +1000 +++ b/mgmt/pcp/man/man1/GNUmakefile 2007-06-29 15:36:13.632867332 +1000 @@ -19,7 +19,7 @@ pmnsmerge.1 pmpost.1 pmprobe.1 pmsocks.1 pmstat.1 pmstore.1 \ pmtrace.1 pmval.1 pmdaweblog.1 pmlogsummary.1 pmdashping.1 \ pmdumptext.1 genpmda.1 pmproxy.1 pmdasummary.1 pmlogreduce.1 \ - autofsd-probe.1 pmie2col.1 telnet-probe.1 + autofsd-probe.1 pmie2col.1 telnet-probe.1 pmsleep.1 MAN_DEST = $(PCP_MAN_DIR)/man$(MAN_SECTION) LSRCFILES = $(MAN_PAGES) =========================================================================== mgmt/pcp/man/man1/pmsleep.1 =========================================================================== --- a/mgmt/pcp/man/man1/pmsleep.1 2006-06-17 00:58:24.000000000 +1000 +++ b/mgmt/pcp/man/man1/pmsleep.1 2007-07-03 16:48:17.618316074 +1000 @@ -0,0 +1,45 @@ +'\"macro stdmacro +.\" +.\" Copyright (c) 2007 Silicon Graphics, Inc. All Rights Reserved. +.\" +.\" $Id$ +.ie \(.g \{\ +.\" ... groff (hack for khelpcenter, man2html, etc.) +.TH PMSLEEP 1 "SGI" "Performance Co-Pilot" +\} +.el \{\ +.if \nX=0 .ds x} PMSLEEP 1 "SGI" "Performance Co-Pilot" +.if \nX=1 .ds x} PMSLEEP 1 "Performance Co-Pilot" +.if \nX=2 .ds x} PMSLEEP 1 "" "\&" +.if \nX=3 .ds x} PMSLEEP "" "" "\&" +.TH \*(x} +.rr X +\} +.SH NAME +\f3pmsleep\f1 \- portable subsecond-capable sleep +.\" literals use .B or \f3 +.\" arguments use .I or \f2 +.SH SYNOPSIS +.B $PCP_BINADM_DIR/pmsleep +.I interval +.SH DESCRIPTION +.B pmsleep +sleeps for +.I interval. +The +.I interval +argument follows the syntax described in +.BR PCPIntro (1) +for +.B \-t, +and in the simplest form may be an unsigned integer +or floating point constant +(the implied units in this case are seconds). +.SH DIAGNOSTICS +The exit status is 0 for success, or 1 for a malformed command line. +If the underlying +.B nanosleep (2) +system call fails, an errno is returned. +.SH SEE ALSO +.BR sleep (1), +.BR nanosleep (3). =========================================================================== mgmt/pcp/src/GNUmakefile =========================================================================== --- a/mgmt/pcp/src/GNUmakefile 2007-07-03 20:16:14.000000000 +1000 +++ b/mgmt/pcp/src/GNUmakefile 2007-06-29 14:46:06.336727771 +1000 @@ -21,7 +21,7 @@ pmdumplog pmlogextract pmstore pmhostname pmgenmap pmlogctl \ pmlogconf pmlogsummary pmclient pmkstat pcp pmlc dbpmda \ xconfirm pmtrace pmstat pmsocks pmdas pmafm procmemstat \ - pmlogreduce genpmda pmproxy telnet-probe + pmlogreduce genpmda pmproxy telnet-probe pmsleep ifneq ($(TARGET_OS), cygwin) SUBDIRS += libpcp_pmc pmdumptext autofsd-probe =========================================================================== mgmt/pcp/src/pmcd/rc_pcp =========================================================================== --- a/mgmt/pcp/src/pmcd/rc_pcp 2007-07-03 20:16:14.000000000 +1000 +++ b/mgmt/pcp/src/pmcd/rc_pcp 2007-07-03 19:29:52.089858288 +1000 @@ -383,16 +383,19 @@ fi $ECHO $PCP_ECHO_N "Waiting for PMCD to terminate ...""$PCP_ECHO_C" gone=0 - for i in 1 2 3 4 5 6 + delay=200 + while [ $i -lt $delay ] do - sleep 3 + # dont sleep before 1st pid check, or after last + [ $i -eq 0 ] || pmsleep 0.1 + i=`expr $i + 1` + $VERBOSE && [ `expr $delay % 10` -eq 0 ] && $PCP_ECHO_PROG $PCP_ECHO_N ".""$PCP_ECHO_C" _get_pids_by_name pmcd >$tmp.tmp if [ ! -s $tmp.tmp ] then gone=1 break fi - $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C" done if [ $gone != 1 ] # It just WON'T DIE, give up. then =========================================================================== mgmt/pcp/src/pmcd/src/agent.c =========================================================================== --- a/mgmt/pcp/src/pmcd/src/agent.c 2007-07-03 20:16:14.000000000 +1000 +++ b/mgmt/pcp/src/pmcd/src/agent.c 2007-06-26 14:28:22.912602167 +1000 @@ -166,7 +166,7 @@ found = 0; for ( i = 0; i < nAgents; i++) { ap = &agent[i]; - if (!ap->status.connected) + if (!ap->status.connected || ap->ipcType == AGENT_DSO) continue; found = 1; =========================================================================== mgmt/pcp/src/pmie/pmie_check.sh =========================================================================== --- a/mgmt/pcp/src/pmie/pmie_check.sh 2007-07-03 20:16:14.000000000 +1000 +++ b/mgmt/pcp/src/pmie/pmie_check.sh 2007-07-03 20:09:44.071302396 +1000 @@ -144,44 +144,41 @@ { # demand mutual exclusion # - fail=true rm -f $tmp.stamp - for try in 1 2 3 4 + i=0 + while [ $i -lt 200 ] do + # dont sleep before 1st lock check, or after last + [ $i -eq 0 ] || pmsleep 0.1 + i=`expr $i + 1` + if pmlock -v $logfile.lock >$tmp.out then echo $logfile.lock >$tmp.lock - fail=false - break - else - if [ ! -f $tmp.stamp ] - then - touch -t `pmdate -30M %Y%m%d%H%M` $tmp.stamp - fi - if [ -n "`find $logfile.lock ! -newer $tmp.stamp -print 2>/dev/null`" ] - then - _warning "removing lock file older than 30 minutes" - ls -l $logfile.lock - rm -f $logfile.lock - fi + return 0 fi - sleep 5 - done - - if $fail - then - # failed to gain mutex lock - # - if [ -f $logfile.lock ] + if [ ! -f $tmp.stamp ] then - _warning "is another PCP cron job running concurrently?" + touch -t `pmdate -30M %Y%m%d%H%M` $tmp.stamp + fi + if [ -n "`find $logfile.lock ! -newer $tmp.stamp -print 2>/dev/null`" ] + then + _warning "removing lock file older than 30 minutes" ls -l $logfile.lock - else - echo "$prog: `cat $tmp.out`" + rm -f $logfile.lock fi - _warning "failed to acquire exclusive lock ($logfile.lock) ..." - continue + done + # failed to gain mutex lock + # + if [ -f $logfile.lock ] + then + warning "is another PCP cron job running concurrently?" + ls -l $logfile.lock + else + echo "$prog: `cat $tmp.out`" fi + _warning "failed to acquire exclusive lock ($logfile.lock) ..." + return 1 } _unlock() @@ -270,51 +267,49 @@ # wait for maximum time of a connection and 20 requests # - delay=`expr $delay + 20 \* $x` + # $logfile was previously removed, if it has appeared again + # then we know pmlogger has started ... if not just sleep and + # try again + # + delay=`expr 10 \* \( $delay + 20 \* $x \) ` i=0 while [ $i -lt $delay ] do - $VERBOSE && $PCP_ECHO_PROG $PCP_ECHO_N ".""$PCP_ECHO_C" - if [ -f $logfile ] + # dont sleep before 1st log check + [ $i -eq 0 ] || pmsleep 0.1 + i=`expr $i + 1` + $VERBOSE && [ `expr $delay % 10` -eq 0 ] && $PCP_ECHO_PROG $PCP_ECHO_N ".""$PCP_ECHO_C" + [ ! -f $logfile ] && continue + if ls $PCP_TMP_DIR/pmie/$1 >$tmp.out 2>&1 then - # $logfile was previously removed, if it has appeared again then - # we know pmie has started ... if not just sleep and try again - # - if ls $PCP_TMP_DIR/pmie/$1 >$tmp.out 2>&1 - then - if grep "No such file or directory" $tmp.out >/dev/null - then - : - else - sleep 5 - $VERBOSE && echo " done" - return 0 - fi - fi - case "$PCP_PLATFORM" - in - irix) - ps -e | grep "^ *$1 " >/dev/null - ;; - linux) - test -e /proc/$1 - ;; - esac - - if [ $? -ne 0 ] + if grep "No such file or directory" $tmp.out >/dev/null then - $VERBOSE || _message restart - echo " process exited!" - echo "$prog: Error: failed to restart pmie" - echo "Current pmie processes:" - ps $PCP_PS_ALL_FLAGS | sed -n -e 1p -e "/$PMIE/p" - echo - _check_logfile - return 1 + : + else + pmsleep 0.1 + $VERBOSE && echo " done" + return 0 fi fi - sleep 5 - i=`expr $i + 5` + case "$PCP_PLATFORM" + in + irix) + ps -e | grep "^ *$1 " >/dev/null + ;; + linux) + test -e /proc/$1 + ;; + esac + + [ $? -eq 0 ] && continue + $VERBOSE || _message restart + echo " process exited!" + echo "$prog: Error: failed to restart pmie" + echo "Current pmie processes:" + ps $PCP_PS_ALL_FLAGS | sed -n -e 1p -e "/$PMIE/p" + echo + _check_logfile + return 1 done $VERBOSE || _message restart echo " timed out waiting!" @@ -434,8 +429,11 @@ then _warning "no write access in $dir, skip lock file processing" ls -ld $dir + elif _lock + then + : else - _lock + continue fi # match $logfile and $fqdn from control file to running pmies @@ -630,13 +628,20 @@ then $VERY_VERBOSE && ( echo; $PCP_ECHO_PROG $PCP_ECHO_N "+ $KILL -KILL `cat $tmp.pmies` ...""$PCP_ECHO_C" ) eval $KILL -KILL $pmielist >/dev/null 2>&1 - sleep 3 # give them a chance to go - if ps -f -p "$pmielist" >$tmp.alive 2>&1 - then + i=0 + while ps -f -p "$pmielist" >$tmp.alive 2>&1 + do + if [ $i -lt 30 ] + then + pmsleep 0.1 + i=`expr $i + 1` + continue; + fi echo "$prog: Error: pmie process(es) will not die" cat $tmp.alive status=1 - fi + break + done fi fi =========================================================================== mgmt/pcp/src/pmlogctl/pmlogger_check.sh =========================================================================== --- a/mgmt/pcp/src/pmlogctl/pmlogger_check.sh 2007-07-03 20:16:14.000000000 +1000 +++ b/mgmt/pcp/src/pmlogctl/pmlogger_check.sh 2007-07-03 20:10:13.687441468 +1000 @@ -192,60 +192,51 @@ # wait for maximum time of a connection and 20 requests # - delay=`expr $delay + 20 \* $x` + # $logfile was previously removed, if it has appeared again + # then we know pmlogger has started ... if not just sleep and + # try again + # + delay=`expr 10 \* \( $delay + 20 \* $x \) ` i=0 - while [ $i -lt $delay ] + while [ $i -lt $delay ] # caution: nested continue 2 below do - $VERBOSE && $PCP_ECHO_PROG $PCP_ECHO_N ".""$PCP_ECHO_C" - if [ -f $logfile ] + # dont sleep before 1st log check + [ $i -eq 0 ] || pmsleep 0.1 + i=`expr $i + 1` + $VERBOSE && [ `expr $delay % 10` -eq 0 ] && $PCP_ECHO_PROG $PCP_ECHO_N ".""$PCP_ECHO_C" + [ ! -f $logfile ] && continue + if echo "connect $1" | pmlc 2>&1 | grep "Unable to connect" >/dev/null then - # $logfile was previously removed, if it has appeared again - # then we know pmlogger has started ... if not just sleep and - # try again - # - if echo "connect $1" | pmlc 2>&1 | grep "Unable to connect" >/dev/null - then - : - else - sleep 5 - $VERBOSE && echo " done" - return 0 - fi + : + else + pmsleep 0.1 + $VERBOSE && echo " done" + return 0 + fi - _plist=`_get_pids_by_name pmlogger` - _found=false + _plist=`_get_pids_by_name pmlogger` + _found=false + for _p in `echo $_plist` + do + [ $_p -eq $1 ] && continue 2 + done + $VERBOSE || _message restart + echo " process exited!" + if $TERSE + then + : + else + echo "$prog: Error: failed to restart pmlogger" + echo "Current pmlogger processes:" + ps $PCP_PS_ALL_FLAGS | tee $tmp.tmp | sed -n -e 1p for _p in `echo $_plist` - do - [ $_p -eq $1 ] && _found=true - done - - if $_found - then - # process still here, just not accepting pmlc connections - # yet, try again - : - else - $VERBOSE || _message restart - echo " process exited!" - if $TERSE - then - : - else - echo "$prog: Error: failed to restart pmlogger" - echo "Current pmlogger processes:" - ps $PCP_PS_ALL_FLAGS | tee $tmp.tmp | sed -n -e 1p - for _p in `echo $_plist` - do - sed -n -e "/^[ ]*[^ ]* [ ]*$_p /p" < $tmp.tmp - done - echo - fi - _check_logfile - return 1 - fi + do + sed -n -e "/^[ ]*[^ ]* [ ]*$_p /p" < $tmp.tmp + done + echo fi - sleep 5 - i=`expr $i + 5` + _check_logfile + return 1 done $VERBOSE || _message restart echo " timed out waiting!" @@ -259,6 +250,56 @@ return 1 } +_lock() +{ + # demand mutual exclusion + # + rm -f $tmp.stamp + i=0 + while [ $i -lt 200 ] + do + # dont sleep before 1st lock check, or after last + [ $i -eq 0 ] || pmsleep 0.1 + i=`expr $i + 1` + + if pmlock -v lock >$tmp.out + then + echo $dir/lock >$tmp.lock + return 0 + fi + if [ ! -f $tmp.stamp ] + then + if uname -r | grep '^5\.3' >/dev/null + then + # IRIX 5.3 does not support -t for touch(1) + # + touch `pmdate -30M %m%d%H%M%y` $tmp.stamp + else + touch -t `pmdate -30M %Y%m%d%H%M` $tmp.stamp + fi + fi + if [ ! -z "`find lock -newer $tmp.stamp -print 2>/dev/null`" ] + then + : + else + echo "$prog: Warning: removing lock file older than 30 minutes" + LC_TIME=POSIX ls -l $dir/lock + rm -f lock + fi + done + # failed to gain mutex lock + # + if [ -f lock ] + then + echo "$prog: Warning: is another PCP cron job running concurrently?" + LC_TIME=POSIX ls -l $dir/lock + else + echo "$prog: `cat $tmp.out`" + fi + _warning "failed to acquire exclusive lock ($dir/lock) ..." + return 1 +} + # note on control file format version # 1.0 was shipped as part of PCPWEB beta, and did not include the # socks field [this is the default for backwards compatibility] @@ -374,56 +415,11 @@ if [ ! -w $dir ] then echo "$prog: Warning: no write access in $dir, skip lock file processing" + elif _lock + then + : else - # demand mutual exclusion - # - fail=true - rm -f $tmp.stamp - for try in 1 2 3 4 - do - if pmlock -v lock >$tmp.out - then - echo $dir/lock >$tmp.lock - fail=false - break - else - if [ ! -f $tmp.stamp ] - then - if uname -r | grep '^5\.3' >/dev/null - then - # IRIX 5.3 does not support -t for touch(1) - # - touch `pmdate -30M %m%d%H%M%y` $tmp.stamp - else - touch -t `pmdate -30M %Y%m%d%H%M` $tmp.stamp - fi - fi - if [ ! -z "`find lock -newer $tmp.stamp -print 2>/dev/null`" ] - then - : - else - echo "$prog: Warning: removing lock file older than 30 minutes" - LC_TIME=POSIX ls -l $dir/lock - rm -f lock - fi - fi - sleep 5 - done - - if $fail - then - # failed to gain mutex lock - # - if [ -f lock ] - then - echo "$prog: Warning: is another PCP cron job running concurrently?" - LC_TIME=POSIX ls -l $dir/lock - else - echo "$prog: `cat $tmp.out`" - fi - _warning "failed to acquire exclusive lock ($dir/lock) ..." - continue - fi + continue fi pid='' =========================================================================== mgmt/pcp/src/pmsleep/GNUmakefile =========================================================================== --- a/mgmt/pcp/src/pmsleep/GNUmakefile 2006-06-17 00:58:24.000000000 +1000 +++ b/mgmt/pcp/src/pmsleep/GNUmakefile 2007-06-29 14:33:28.335332331 +1000 @@ -0,0 +1,25 @@ +#!gmake +# +# Copyright (c) 2007 Silicon Graphics, Inc. All Rights Reserved. +# +# $Id$ +# + +TOPDIR = ../.. +include $(TOPDIR)/src/include/builddefs + +LLDLIBS = -lpcp +CFILES = pmsleep.c +CMDTARGET = pmsleep$(EXECSUFFIX) +LDIRT = $(TARGET) + +default: $(CMDTARGET) + +include $(BUILDRULES) + +install: $(CMDTARGET) + $(INSTALL) -m 755 $(CMDTARGET) $(PCP_BINADM_DIR)/$(CMDTARGET) + +default_pcp: default + +install_pcp: install =========================================================================== mgmt/pcp/src/pmsleep/pmsleep.c =========================================================================== --- a/mgmt/pcp/src/pmsleep/pmsleep.c 2006-06-17 00:58:24.000000000 +1000 +++ b/mgmt/pcp/src/pmsleep/pmsleep.c 2007-07-03 17:05:11.731485122 +1000 @@ -0,0 +1,35 @@ +/* + * Copyright (c) 2007 Silicon Graphics, Inc. All Rights Reserved. + */ + +#include +#include +#include +#include +#include "pmapi.h" + +int +main(int argc, char **argv) +{ + struct timespec rqt; + struct timeval delta; + int r = 0; + char *msg; + + if (argc == 2) { + if (pmParseInterval(argv[1], &delta, &msg) < 0) { + fputs(msg, stderr); + free(msg); + } else { + rqt.tv_sec = delta.tv_sec; + rqt.tv_nsec = delta.tv_usec * 1000; + if (0 != nanosleep(&rqt, NULL)) + r = errno; + + exit(r); + } + } + fprintf(stderr, "Usage: pmsleep interval\n"); + exit(1); + /*NOTREACHED*/ +} Dr.Michael("Kimba")Newton kimbrr@sgi.com From nscott@aconex.com Tue Jul 3 16:32:20 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 03 Jul 2007 16:32:25 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l63NWItL011999 for ; Tue, 3 Jul 2007 16:32:20 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id EE5A792C3A3; Wed, 4 Jul 2007 09:32:18 +1000 (EST) Subject: Re: Review: PCP & pmlogger take too long to start From: Nathan Scott Reply-To: nscott@aconex.com To: Michael Newton Cc: pcp@oss.sgi.com In-Reply-To: References: <1182996127.15488.102.camel@edge.yarra.acx> <1183355238.15488.217.camel@edge.yarra.acx> <1183356141.15488.223.camel@edge.yarra.acx> <1183417678.15488.257.camel@edge.yarra.acx> Content-Type: text/plain Organization: Aconex Date: Wed, 04 Jul 2007 09:31:30 +1000 Message-Id: <1183505491.15488.330.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1295 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp On Tue, 2007-07-03 at 20:19 +1000, Michael Newton wrote: > i decided to repost as ive tried to take on board favouring > readability & > conciseness over keeping the diffs short This isn't quite right still, and I'm confused as to why you are looking for a halfway point between the old code and the simpler version that I posted? In particular: > > =========================================================================== > mgmt/pcp/src/pmcd/rc_pcp > =========================================================================== > > --- a/mgmt/pcp/src/pmcd/rc_pcp 2007-07-03 20:16:14.000000000 +1000 > +++ b/mgmt/pcp/src/pmcd/rc_pcp 2007-07-03 19:29:52.089858288 +1000 > @@ -383,16 +383,19 @@ > fi > $ECHO $PCP_ECHO_N "Waiting for PMCD to > terminate ...""$PCP_ECHO_C" > gone=0 > - for i in 1 2 3 4 5 6 > + delay=200 > + while [ $i -lt $delay ] > do > - sleep 3 > + # dont sleep before 1st pid check, or after last > + [ $i -eq 0 ] || pmsleep 0.1 > + i=`expr $i + 1` This is what I meant with "halfway" - this keeps $i for no reason AFAICT (am i missing something?) - $i was the 123456 loop counter before, but now that we can control the loop using $delay, it's redundant. And why keep the sleep at the top of the loop, with that extra zero check, instead of having it as the end of the loop, and skipping out if we've started the process? > + $VERBOSE && [ `expr $delay % 10` -eq 0 ] && $PCP_ECHO_PROG > $PCP_ECHO_N ".""$PCP_ECHO_C" Little bug there - $delay is a constant, in your version, so you will only ever print one '.'. In my version, delay was used to count down to zero, so using $delay there was valid for my patch, but not yours. > _get_pids_by_name pmcd >$tmp.tmp > if [ ! -s $tmp.tmp ] > then > gone=1 > break > fi > - $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C" > done > if [ $gone != 1 ] # It just WON'T DIE, give up. And when using $delay as the loop control, we can also remove the special $gone boolean flag, since we always know at the end of the loop which way it ended. Was there something not working in my patch that it needed to be reworked? This is what these loops should look like IMO (and they should have the same control structure in all of the scripts) - it is very easy to follow this loop structure, compared to the other way(s) with the additional variables: delay=200 # tenths of a second while [ $delay -gt 0 ] do if pmlock -v lock >$tmp.out then echo $dir/lock >$tmp.lock break else [ -f $tmp.stamp ] || touch -t `pmdate -30M %Y%m%d%H%M` $tmp.stamp if [ -z "`find lock -newer $tmp.stamp -print 2>/dev/null`" ] then echo "$prog: Warning: removing lock file older than 30 minutes" LC_TIME=POSIX ls -l $dir/lock rm -f lock fi fi pmsleep 0.1 delay=`expr $delay - 1` done if [ $delay -eq 0 ] Also, as done above, this code should be removed entirely: + if uname -r | grep '^5\.3' >/dev/null + then + # IRIX 5.3 does not support -t for touch(1) + # + touch `pmdate -30M %m%d%H%M%y` $tmp.stamp + else It was OK when IRIX was the only platform, but now any platform that has a 5.3 version will get caught in that code accidentally. And IRIX 5.3 is ancient history, so just toss this special case. I'll update my git tree with that patch shortly. cheers. -- Nathan From nscott@aconex.com Tue Jul 3 16:48:04 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 03 Jul 2007 16:48:08 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l63Nm1tL016312 for ; Tue, 3 Jul 2007 16:48:03 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 62D9F92C37A for ; Wed, 4 Jul 2007 09:48:03 +1000 (EST) Subject: pcp updates From: Nathan Scott Reply-To: nscott@aconex.com To: pcp@oss.sgi.com Content-Type: text/plain Organization: Aconex Date: Wed, 04 Jul 2007 09:47:15 +1000 Message-Id: <1183506435.15488.334.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1296 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Changes committed to git://oss.sgi.com:8090/nathans/pcp.git src/pmie/pmie_check.sh | 2 +- src/pmlogctl/pmlogger_check.sh | 19 +++---------------- 2 files changed, 4 insertions(+), 17 deletions(-) commit 3120260bf927bde392565c080ba0b5ba80a3f5d2 Author: Nathan Scott Date: Wed Jul 4 09:38:33 2007 +1000 Remove extra sleep on successful pmlogger/pmie start, and an IRIX 5.3 snippet. From nscott@aconex.com Tue Jul 3 17:33:42 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 03 Jul 2007 17:33:48 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l640XftL032388 for ; Tue, 3 Jul 2007 17:33:42 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 0682492C5C5 for ; Wed, 4 Jul 2007 10:33:43 +1000 (EST) Subject: PCP QA test tarball update? From: Nathan Scott Reply-To: nscott@aconex.com To: pcp@oss.sgi.com Content-Type: text/plain Organization: Aconex Date: Wed, 04 Jul 2007 10:32:55 +1000 Message-Id: <1183509175.15488.346.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1297 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Hi guys, >From ftp://oss.sgi.com/projects/pcp/download/ there is a QA source tarball - "pcp-qa-1.3.tar.gz 1154 KB 02/12/05 00:00:00" which hasn't been updated for 18 months or so - is there a more recent version that could be made available for us coders making changes? Were there many differences between 2.7.1 QA and that original tarball - maybe a diff would suffice? thanks! -- Nathan From kimbrr@sgi.com Tue Jul 3 17:55:46 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 03 Jul 2007 17:55:50 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l640tgtL006326 for ; Tue, 3 Jul 2007 17:55:44 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA29294; Wed, 4 Jul 2007 10:55:39 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l640taeW10754756; Wed, 4 Jul 2007 10:55:37 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l640tYqq10793241; Wed, 4 Jul 2007 10:55:36 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Wed, 4 Jul 2007 10:55:34 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: Nathan Scott cc: pcp@oss.sgi.com Subject: Re: Review: PCP & pmlogger take too long to start In-Reply-To: <1183505491.15488.330.camel@edge.yarra.acx> Message-ID: References: <1182996127.15488.102.camel@edge.yarra.acx> <1183355238.15488.217.camel@edge.yarra.acx> <1183356141.15488.223.camel@edge.yarra.acx> <1183417678.15488.257.camel@edge.yarra.acx> <1183505491.15488.330.camel@edge.yarra.acx> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1298 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp On Wed, 4 Jul 2007, Nathan Scott wrote: > On Tue, 2007-07-03 at 20:19 +1000, Michael Newton wrote: > > i decided to repost as ive tried to take on board favouring > > readability & > > conciseness over keeping the diffs short > > > This isn't quite right still, and I'm confused as to why you are > looking for a halfway point between the old code and the simpler > version that I posted? In particular: > > > > > > =========================================================================== > > mgmt/pcp/src/pmcd/rc_pcp > > =========================================================================== > > > > --- a/mgmt/pcp/src/pmcd/rc_pcp 2007-07-03 20:16:14.000000000 +1000 > > +++ b/mgmt/pcp/src/pmcd/rc_pcp 2007-07-03 19:29:52.089858288 +1000 > > @@ -383,16 +383,19 @@ > > fi > > $ECHO $PCP_ECHO_N "Waiting for PMCD to > > terminate ...""$PCP_ECHO_C" > > gone=0 > > - for i in 1 2 3 4 5 6 > > + delay=200 > > + while [ $i -lt $delay ] > > do > > - sleep 3 > > + # dont sleep before 1st pid check, or after last > > + [ $i -eq 0 ] || pmsleep 0.1 > > + i=`expr $i + 1` > > This is what I meant with "halfway" - this keeps $i for no reason > AFAICT (am i missing something?) - $i was the 123456 loop counter > before, but now that we can control the loop using $delay, it's > redundant. > > And why keep the sleep at the top of the loop, with that extra zero > check, instead of having it as the end of the loop, and skipping out > if we've started the process? its so that you: "# dont sleep before 1st pid check, or after last" ..your version continues to have a final sleep which is not followed by a check (in this case, of whether the proc has exited). Thats just a delay to no effect. If its "halfway", its because i tried to avoid that problem, but also get rid of "while :". Im not greatly fond of a conditional thats only going to trigger the 1st time through, but given that all its protecting is a delay, it doesnt seem too bad. I find it strange that you are so keen (quite rightly, im sure!) on ditching the sleep-before-return that youve just announced the patch to remove, but keep putting it in in the body of the loop! youre right of course it cant be $delay %10 in my version: thats bad. Also its true i could get by with 2 variables instead of 3.. but to reduce it to one, its back to "while :". You can have (1) while : + 1 variable + sleep at the end OR u can have (2) while [ delay cond ] with a variable protecting the sleep at the start. Trying to get both is going to be even sillier IMHO. I'll stop trying to 2nd guess what youre going to like (at which i am manifestly failing) and go with my own feeling that (1) is more readable than (2) arguably using a special purpose $gone is more robust from a maintenance PoV. For instance, if i changed my code only by removing $gone [and changing $delay %10 to $i %10 of course!], it would technically be incorrect in that if i happen to have found the pid gone on the last iteration, it would say the process didnt die.. id need to also move the $i inc down after the break.. a kind of maintenance error ive seen plenty times. Still, its a bit swings & roundabouts.. i wouldnt stress to defend that one > > + $VERBOSE && [ `expr $delay % 10` -eq 0 ] && $PCP_ECHO_PROG > > $PCP_ECHO_N ".""$PCP_ECHO_C" > > Little bug there - $delay is a constant, in your version, so > you will only ever print one '.'. In my version, delay was > used to count down to zero, so using $delay there was valid > for my patch, but not yours. > > > _get_pids_by_name pmcd >$tmp.tmp > > if [ ! -s $tmp.tmp ] > > then > > gone=1 > > break > > fi > > - $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C" > > done > > if [ $gone != 1 ] # It just WON'T DIE, give up. > > And when using $delay as the loop control, we can also remove > the special $gone boolean flag, since we always know at the > end of the loop which way it ended. > > Was there something not working in my patch that it needed to > be reworked? This is what these loops should look like IMO > (and they should have the same control structure in all of the > scripts) - it is very easy to follow this loop structure, > compared to the other way(s) with the additional variables: > > delay=200 # tenths of a second > while [ $delay -gt 0 ] > do > if pmlock -v lock >$tmp.out > then > echo $dir/lock >$tmp.lock > break > else > [ -f $tmp.stamp ] || touch -t `pmdate -30M %Y%m%d%H%M` > $tmp.stamp > if [ -z "`find lock -newer $tmp.stamp -print > 2>/dev/null`" ] > then > echo "$prog: Warning: removing lock file older than > 30 minutes" > LC_TIME=POSIX ls -l $dir/lock > rm -f lock > fi > fi > pmsleep 0.1 > delay=`expr $delay - 1` > done > > if [ $delay -eq 0 ] > > > Also, as done above, this code should be removed entirely: > > + if uname -r | grep '^5\.3' >/dev/null > + then > + # IRIX 5.3 does not support -t for touch(1) > + # > + touch `pmdate -30M %m%d%H%M%y` $tmp.stamp > + else ok > It was OK when IRIX was the only platform, but now any platform > that has a 5.3 version will get caught in that code accidentally. > And IRIX 5.3 is ancient history, so just toss this special case. > I'll update my git tree with that patch shortly. ..which patch also removes the sleep-before-returns.. so does that imply youve been able to ask ken about it already? or you'll just back it out (or find another solution) if he tells you it was actually achieving something? # longsufferingmode continues indefinitely.. Dr.Michael("Kimba")Newton kimbrr@sgi.com From nscott@aconex.com Tue Jul 3 18:46:50 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 03 Jul 2007 18:46:55 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l641kntL021236 for ; Tue, 3 Jul 2007 18:46:50 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 13D3292C53E; Wed, 4 Jul 2007 11:46:51 +1000 (EST) Subject: Re: Review: PCP & pmlogger take too long to start From: Nathan Scott Reply-To: nscott@aconex.com To: Michael Newton Cc: pcp@oss.sgi.com In-Reply-To: References: <1182996127.15488.102.camel@edge.yarra.acx> <1183355238.15488.217.camel@edge.yarra.acx> <1183356141.15488.223.camel@edge.yarra.acx> <1183417678.15488.257.camel@edge.yarra.acx> <1183505491.15488.330.camel@edge.yarra.acx> Content-Type: text/plain Organization: Aconex Date: Wed, 04 Jul 2007 11:46:02 +1000 Message-Id: <1183513563.15488.396.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1299 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp On Wed, 2007-07-04 at 10:55 +1000, Michael Newton wrote: > its so that you: "# dont sleep before 1st pid check, or after last" > ..your version continues to have a final sleep which is not followed > by a check (in this case, of whether the proc has exited). Thats just > a delay to no effect. Light bulb goes on, I see how you're looking at it now - you're concerned about before _and_ after... (even though after doesn't matter), I thought you were hung up on _before_ only. So, in practice, that there extra sleep at the end is not really a problem, right? Thats the timing-out case - basically, we slept as long as we allowed for (which is some arbitrary, very long time) - if theres an extra 0.1 sec sleep after 10/20 seconds, it just does not matter. Take this minimal example, from rc_pcp, when stopping pmcd: $ECHO $PCP_ECHO_N "Waiting for PMCD to terminate ...""$PCP_ECHO_C" delay=200 # tenths of a second while [ $delay -gt 0 ] do _get_pids_by_name pmcd >$tmp.tmp [ ! -s $tmp.tmp ] && break pmsleep 0.1 delay=`expr $delay - 1` [ `expr $delay % 10` -ne 0 ] || $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C" done if [ $delay -eq 0 ] # It just WON'T DIE, give up. There's no initial sleeps obviously. At the end of the day, if we don't stop pmcd after 20 seconds (and it makes no difference if it was 20.0, 19.1, or 20.1 seconds, really, after that amount of time pmcd just ain't stopping) that other 0.1s is noise. Its worth pointing out that both of your previous patches had bugs, due to the additional complexity IMO - its just that they were more complex and thus more likely to have something wrong, whereas this other way is almost too simple to have anything go wrong (heh, heh, heh - famous last words! but zero bugs found so far...). > arguably using a special purpose $gone is more robust from a > maintenance PoV. My way of looking at it is "implement things with the minimum of state variables necessary" - this prevents the kind of accidentally- using-the-wrong variable bugs that your earlier patches both had. > or you'll just back it out (or find another solution) if he tells you > it was actually achieving something? He's away - I'll run with it for awhile, see if anything happens (which seems unlikely), and ask if he remembers anything when he returns. This way (having it in my tree) I make sure I don't forget about it too. cheers. ps: email is a shitty communication channel for this sort of discussion - any interest in a public pcp IRC channel on one of the open source networks? I've started looking into setting that up, it works really well for #xfs on freenode.net. -- Nathan From kimbrr@sgi.com Wed Jul 4 00:15:39 2007 Received: with ECARTIS (v1.0.0; list pcp); Wed, 04 Jul 2007 00:15:43 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l647FatL006112 for ; Wed, 4 Jul 2007 00:15:37 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id RAA09754; Wed, 4 Jul 2007 17:15:27 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l647FOeW10917213; Wed, 4 Jul 2007 17:15:25 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l647FN2s10971664; Wed, 4 Jul 2007 17:15:23 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Wed, 4 Jul 2007 17:15:23 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: Mark Goodwin , Nathan Scott cc: pcp@oss.sgi.com Subject: Re: PCP start/stop script regression In-Reply-To: <1180066020.6273.593.camel@edge> Message-ID: References: <1180062348.6273.575.camel@edge> <1180066020.6273.593.camel@edge> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1300 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp On Fri, 25 May 2007, Nathan Scott wrote: > On Fri, 2007-05-25 at 13:45 +1000, Michael Newton wrote: > > On Fri, 25 May 2007, Nathan Scott wrote: > > > It looks like the pcp start script (src/pmcd/rc_pcp) has been > > > changed post pcp-2.5.x to use the file /var/run/pcp/pmcd.pid > > > for decisions about whether pmcd is running or not. Can you > > > send details about the problem being solved by this change? > > > > * rpm -e pcp stops pcp > > * kill by name means removing pcp in a chroot stops global pcp > > ..so make clean in mangrove would stop pcp on the build machine [..the famous serial killer..] > Ah, now it makes more sense. Thanks. > > Would changing the mangrove build to do "rpm -e pcp --noscripts" > resolve this and also allow the upgrade issue to be fixed? this rang a bell, so now i finally had a minute to check back thru the mail, i see ivan asked a similar qn. We should probably pursue this but it means a change in linuxmeister/build/init_buildsystem -- which is actually part of SuSE's build stuff. There is no current facility for passing in --noscripts specifically, nor extra args in general. Mark do you know how to go about getting such a change into linuxmeister? Or recommend me someone to talk to? Im assuming we cant just change it in our own copy? (in addition of course to suggesting same to SuSE) 2 possible stopgaps: * restore the fallback to killall in oss, but not mangrove * have pmcd log creation of the pidfile. Only call killall if there is a log file and it does *not* show creation of the pid file. Dr.Michael("Kimba")Newton kimbrr@sgi.com From markgw@sgi.com Wed Jul 4 14:39:31 2007 Received: with ECARTIS (v1.0.0; list pcp); Wed, 04 Jul 2007 14:39:36 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l64LdStL023586 for ; Wed, 4 Jul 2007 14:39:30 -0700 Received: from [134.15.251.7] (melb-sw-corp-251-7.corp.sgi.com [134.15.251.7]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id HAA27784; Thu, 5 Jul 2007 07:39:21 +1000 Message-ID: <468C1379.8030200@sgi.com> Date: Thu, 05 Jul 2007 07:39:05 +1000 From: Mark Goodwin Reply-To: markgw@sgi.com Organization: SGI Engineering User-Agent: Thunderbird 1.5.0.12 (Windows/20070509) MIME-Version: 1.0 To: Michael Newton CC: Nathan Scott , pcp@oss.sgi.com Subject: Re: PCP start/stop script regression References: <1180062348.6273.575.camel@edge> <1180066020.6273.593.camel@edge> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1301 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: markgw@sgi.com Precedence: bulk X-list: pcp Michael Newton wrote: > Mark do you know how to go about getting such a change into linuxmeister? open a bug against stoutlinux. BTW, such discussions probably belong off list .. Cheers -- Mark From nscott@aconex.com Thu Jul 5 15:58:23 2007 Received: with ECARTIS (v1.0.0; list pcp); Thu, 05 Jul 2007 15:58:31 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l65MwLtL029918 for ; Thu, 5 Jul 2007 15:58:23 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id B22D192C466 for ; Fri, 6 Jul 2007 08:58:21 +1000 (EST) Subject: kmchart updates From: Nathan Scott Reply-To: nscott@aconex.com To: pcp@oss.sgi.com Content-Type: text/plain Organization: Aconex Date: Fri, 06 Jul 2007 08:57:38 +1000 Message-Id: <1183676258.15488.419.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1302 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Changes committed to git://oss.sgi.com:8090/nathans/kmchart.git Makefile | 2 README | 14 - images/Makefile | 16 + images/back_archive.png |binary images/back_archive.svg | 297 +++++++++++++-------- images/back_archive.xpm | 251 ----------------- images/fastback_archive.png |binary images/fastback_archive.svg | 424 ++++++++++++++---------------- images/fastback_archive.xpm | 271 ------------------- images/fastfwd_archive.png |binary images/fastfwd_archive.svg | 421 ++++++++++++++---------------- images/fastfwd_archive.xpm | 273 ------------------- images/play_archive.png |binary images/play_archive.svg | 205 ++++++++++---- images/play_archive.xpm | 241 ----------------- images/play_live.png |binary images/play_live.svg | 257 +++++++++--------- images/play_live.xpm | 282 -------------------- images/play_record.png |binary images/play_record.svg | 563 +++++++++++++++++++++++++++++++++------- images/play_record.xpm | 283 -------------------- images/stepback_archive.png |binary images/stepback_archive.svg | 573 ++++++++++++++--------------------------- images/stepback_archive.xpm | 299 --------------------- images/stepfwd_archive.png |binary images/stepfwd_archive.svg | 513 ++++++++++++++---------------------- images/stepfwd_archive.xpm | 267 ------------------- images/stop_archive.png |binary images/stop_archive.svg | 213 ++++++++++----- images/stop_archive.xpm | 230 ---------------- images/stop_live.png |binary images/stop_live.svg | 479 ++++++++++++++++++++++++++++------ images/stop_live.xpm | 251 ----------------- images/stop_record.png |binary images/stop_record.svg | 492 +++++++++++++++++++++++++++-------- images/stop_record.xpm | 271 ------------------- images/timebackarchive.xpm | 36 -- images/timefastbackarchive.xpm | 36 -- images/timefastfwdarchive.xpm | 36 -- images/timeplayarchive.xpm | 36 -- images/timeplaylive.xpm | 36 -- images/timeplayrecord.xpm | 37 -- images/timestepbackarchive.xpm | 36 -- images/timestepfwdarchive.xpm | 36 -- images/timestoparchive.xpm | 36 -- images/timestoplive.xpm | 36 -- images/timestoprecord.xpm | 37 -- src/chart/kmchart.pro | 21 + src/chart/kmchart.ui | 37 +- src/chart/kmchart.ui.h | 66 ---- src/chart/main.h | 8 src/chart/tab.cpp | 29 +- src/chart/tab.h | 28 -- src/chart/timebutton.cpp | 79 +++++ src/chart/timebutton.h | 61 ++++ 55 files changed, 2911 insertions(+), 5204 deletions(-) commit ad7ccf826e8512f677f4049b4819dc0ed1e23216 Author: Nathan Scott Date: Fri Jul 6 07:09:57 2007 +1000 Remove old-school time button images, not used for ages. commit 14ff85b694c3378b239ddd52b2fb9280fc36bc16 Author: Nathan Scott Date: Fri Jul 6 07:07:09 2007 +1000 Update README to document a couple of known bugs, and fix a typo. commit 3a6e5e8bac8202af3c063462954b3d7a01c2aa91 Author: Nathan Scott Date: Fri Jul 6 07:06:11 2007 +1000 Make lighting shade more consistently across time button images. commit 9165af22a69708195809ab8c72bd461d4c44c720 Author: Nathan Scott Date: Thu Jul 5 13:14:20 2007 +1000 Fix up gradients and opacity on vcr direction triangles. commit eff96fdc78d7c80ec46c54856a4e040e0e16ac22 Author: Nathan Scott Date: Thu Jul 5 12:53:15 2007 +1000 Update live mode time button pixmaps as well. commit fd3678f6e89afedeab55dc8e1fb37f32fa4adba2 Author: Nathan Scott Date: Thu Jul 5 11:51:27 2007 +1000 Funky etched look to the static time button text. commit 9df79b2e085589c4ed7f6300ecbcf1a3f683d8da Author: Nathan Scott Date: Thu Jul 5 11:27:24 2007 +1000 Add images subdir into the top level SUBDIRS macro. commit ad30f48c9c8b5f8490c9c3d85e46c00ced046a89 Author: Nathan Scott Date: Thu Jul 5 11:08:18 2007 +1000 Add a Makefile for images subdir, for source package builds. commit 355483b2a3ef398ac01f63213f37ebb6c9ec138a Author: Nathan Scott Date: Thu Jul 5 11:03:29 2007 +1000 Improve time state button images, fix seeking in archives from kmtime. From markgw@sgi.com Thu Jul 5 17:48:44 2007 Received: with ECARTIS (v1.0.0; list pcp); Thu, 05 Jul 2007 17:48:49 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l660metL022886 for ; Thu, 5 Jul 2007 17:48:42 -0700 Received: from [134.14.55.19] (dhcp19.melbourne.sgi.com [134.14.55.19]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA04798; Fri, 6 Jul 2007 10:48:38 +1000 Message-ID: <468D915A.7010301@sgi.com> Date: Fri, 06 Jul 2007 10:48:26 +1000 From: Mark Goodwin Reply-To: markgw@sgi.com Organization: SGI Engineering User-Agent: Thunderbird 1.5.0.12 (Windows/20070509) MIME-Version: 1.0 To: nscott@aconex.com CC: pcp@oss.sgi.com Subject: Re: PCP QA test tarball update? References: <1183509175.15488.346.camel@edge.yarra.acx> In-Reply-To: <1183509175.15488.346.camel@edge.yarra.acx> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1303 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: markgw@sgi.com Precedence: bulk X-list: pcp Nathan Scott wrote: > Hi guys, > >>From ftp://oss.sgi.com/projects/pcp/download/ there is a QA > source tarball - "pcp-qa-1.3.tar.gz 1154 KB 02/12/05 00:00:00" > which hasn't been updated for 18 months or so - is there a > more recent version that could be made available for us coders > making changes? Were there many differences between 2.7.1 QA > and that original tarball - maybe a diff would suffice? no we don't have a newer version of this, but we probably should. It's something Ken was looking after back then. If it's just a tarball of the oss qa directory, then it should be pretty easy to push a new version up to oss .. Michael? Cheers -- Mark From nscott@aconex.com Thu Jul 5 17:58:06 2007 Received: with ECARTIS (v1.0.0; list pcp); Thu, 05 Jul 2007 17:58:13 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l660w3tL025509 for ; Thu, 5 Jul 2007 17:58:05 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id CB79692C550 for ; Fri, 6 Jul 2007 10:58:05 +1000 (EST) Subject: PCP IRC channel created From: Nathan Scott Reply-To: nscott@aconex.com To: pcp@oss.sgi.com In-Reply-To: <1183513563.15488.396.camel@edge.yarra.acx> References: <1182996127.15488.102.camel@edge.yarra.acx> <1183355238.15488.217.camel@edge.yarra.acx> <1183356141.15488.223.camel@edge.yarra.acx> <1183417678.15488.257.camel@edge.yarra.acx> <1183505491.15488.330.camel@edge.yarra.acx> <1183513563.15488.396.camel@edge.yarra.acx> Content-Type: text/plain Organization: Aconex Date: Fri, 06 Jul 2007 10:57:22 +1000 Message-Id: <1183683442.15488.430.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1304 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp On Wed, 2007-07-04 at 11:46 +1000, Nathan Scott wrote: > ... any interest in a public pcp IRC channel on one of > the open source networks? We now have a #pcp channel - its hosted on irc.oftc.net. Refer to http://www.oftc.net/ for details of this network. Its open to everyone/anyone to join. See ya there. cheers. -- Nathan From kimbrr@sgi.com Thu Jul 5 18:02:10 2007 Received: with ECARTIS (v1.0.0; list pcp); Thu, 05 Jul 2007 18:02:15 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l66127tL026539 for ; Thu, 5 Jul 2007 18:02:09 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA05097; Fri, 6 Jul 2007 11:02:01 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l6611xeW13354516; Fri, 6 Jul 2007 11:02:00 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l6611w7s13356095; Fri, 6 Jul 2007 11:01:58 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Fri, 6 Jul 2007 11:01:58 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: Mark Goodwin cc: nscott@aconex.com, pcp@oss.sgi.com Subject: Re: PCP QA test tarball update? In-Reply-To: <468D915A.7010301@sgi.com> Message-ID: References: <1183509175.15488.346.camel@edge.yarra.acx> <468D915A.7010301@sgi.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1305 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp On Fri, 6 Jul 2007, Mark Goodwin wrote: > Nathan Scott wrote: > >>From ftp://oss.sgi.com/projects/pcp/download/ there is a QA > > source tarball - "pcp-qa-1.3.tar.gz 1154 KB 02/12/05 00:00:00" > > which hasn't been updated for 18 months or so - is there a > > more recent version that could be made available for us coders > > making changes? Were there many differences between 2.7.1 QA > > and that original tarball - maybe a diff would suffice? > > no we don't have a newer version of this, but we probably should. > It's something Ken was looking after back then. If it's just a > tarball of the oss qa directory, then it should be pretty easy > to push a new version up to oss .. Michael? well if you dont know i certainly dont! but if thats all thats needed, of course i can do it Dr.Michael("Kimba")Newton kimbrr@sgi.com From nscott@aconex.com Thu Jul 5 21:32:22 2007 Received: with ECARTIS (v1.0.0; list pcp); Thu, 05 Jul 2007 21:32:27 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l664WKtL007851 for ; Thu, 5 Jul 2007 21:32:22 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id EEF3192C2CC for ; Fri, 6 Jul 2007 14:32:22 +1000 (EST) Subject: kmchart updates From: Nathan Scott Reply-To: nscott@aconex.com To: pcp@oss.sgi.com Content-Type: text/plain Organization: Aconex Date: Fri, 06 Jul 2007 14:31:40 +1000 Message-Id: <1183696300.15488.443.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1306 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Changes committed to git://oss.sgi.com:8090/nathans/kmchart.git Makefile | 4 aclocal.m4 | 49 -- configure.in | 6 images/back_off.png |binary images/back_off.svg | 816 +++++++++++++++++++++++++++++++++++++++++--- images/back_off.xpm | 157 -------- images/back_on.png |binary images/back_on.svg | 856 +++++++++++++++++++++++++++++++++++++++++------ images/back_on.xpm | 188 ---------- images/fastback_off.png |binary images/fastback_off.svg | 840 ++++++++++++++++++++++++++++++++++++++++++---- images/fastback_off.xpm | 254 ------------- images/fastback_on.png |binary images/fastback_on.svg | 842 ++++++++++++++++++++++++++++++++++++++++++---- images/fastback_on.xpm | 313 ----------------- images/fastfwd_off.png |binary images/fastfwd_off.svg | 810 ++++++++++++++++++++++++++++++++++++++++---- images/fastfwd_off.xpm | 251 ------------- images/fastfwd_on.png |binary images/fastfwd_on.svg | 816 ++++++++++++++++++++++++++++++++++++++++---- images/fastfwd_on.xpm | 313 ----------------- images/play_off.png |binary images/play_off.svg | 776 ++++++++++++++++++++++++++++++++++++++---- images/play_off.xpm | 158 -------- images/play_on.png |binary images/play_on.svg | 766 ++++++++++++++++++++++++++++++++++++++---- images/play_on.xpm | 184 ---------- images/stepback_off.png |binary images/stepback_off.svg | 385 +++++++-------------- images/stepback_off.xpm | 265 -------------- images/stepback_on.png |binary images/stepback_on.svg | 219 +++++------- images/stepback_on.xpm | 322 ----------------- images/stepfwd_off.png |binary images/stepfwd_off.svg | 105 ++--- images/stepfwd_off.xpm | 266 -------------- images/stepfwd_on.png |binary images/stepfwd_on.svg | 195 ++++------ images/stepfwd_on.xpm | 322 ----------------- images/stop_off.png |binary images/stop_off.svg | 30 - images/stop_off.xpm | 126 ------ images/stop_on.png |binary images/stop_on.svg | 51 +- images/stop_on.xpm | 145 ------- m4/package_qtdev.m4 | 3 src/chart/GNUmakefile | 4 src/include/builddefs.in | 45 -- src/time/GNUmakefile | 4 src/time/kmtime.pro | 34 - src/time/kmtimearch.ui.h | 94 ++--- src/time/kmtimelive.ui.h | 42 +- src/time/main.cpp | 32 + src/time/main.h | 21 + 54 files changed, 6492 insertions(+), 4617 deletions(-) commit 83c16336d9051fe52f18fb8c991d81005e5a5074 Author: Nathan Scott Date: Fri Jul 6 14:23:51 2007 +1000 Make configure ensure Qt build tools are present and Makefiles use tool macros. commit f60fa9d9172506d8e37a139d0164a9fb5bac41fd Author: Nathan Scott Date: Fri Jul 6 13:50:53 2007 +1000 Fix make install target so binaries default install to the "right" place. commit dfab4f8e0113b764982c3561aa2be81cf8de1a00 Author: Nathan Scott Date: Fri Jul 6 13:49:42 2007 +1000 Update kmtime images to be consistent with kmchart; removing last .xpm use. From kimbrr@sgi.com Sun Jul 8 22:03:11 2007 Received: with ECARTIS (v1.0.0; list pcp); Sun, 08 Jul 2007 22:03:19 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l69537tL012390 for ; Sun, 8 Jul 2007 22:03:09 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA01578; Mon, 9 Jul 2007 15:03:08 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l69537eW17031923; Mon, 9 Jul 2007 15:03:07 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l69536Ba17029999; Mon, 9 Jul 2007 15:03:07 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Mon, 9 Jul 2007 15:03:06 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: pcp@oss.sgi.com, pcp-announce@sgi.com, pcp-dev@sgi.com Subject: Re: [ANNOUNCE] PCP QA suite 1.4 available In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1307 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp SGI is pleased to announce a new version of the QA suite for the Performance Co-Pilot (PCP) open source (version 1.4) is now available for download from : ftp://oss.sgi.com/projects/pcp/download in pcp-qa-1.4.tar.gz Dr.Michael("Kimba")Newton kimbrr@sgi.com From nscott@aconex.com Tue Jul 10 15:35:10 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 10 Jul 2007 15:35:16 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l6AMZ9bm001505 for ; Tue, 10 Jul 2007 15:35:10 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id C5BA792C3F4 for ; Wed, 11 Jul 2007 08:35:09 +1000 (EST) Subject: kmchart updates From: Nathan Scott Reply-To: nscott@aconex.com To: pcp@oss.sgi.com Content-Type: text/plain Organization: Aconex Date: Wed, 11 Jul 2007 08:34:39 +1000 Message-Id: <1184106879.15488.466.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1308 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Changes committed to git://oss.sgi.com:8090/nathans/kmchart.git VERSION | 2 doc/CHANGES | 19 - images/dialog-archive.png |binary images/dialog-archive.svg | 580 ++++++++++++++++++++++++++++++++ images/dialog-error.png |binary images/dialog-error.svg | 209 +++++++++++ images/dialog-host.png |binary images/dialog-host.svg | 741 ++++++++++++++++++++++++++++++++++++++++++ images/dialog-information.png |binary images/dialog-information.svg | 668 +++++++++++++++++++++++++++++++++++++ images/dialog-question.png |binary images/dialog-question.svg | 166 +++++++++ images/dialog-warning.png |binary images/dialog-warning.svg | 222 ++++++++++++ man/man1/kmquery.1 | 253 ++++++++++++++ src/Makefile | 4 src/chart/aboutdialog.ui | 2 src/chart/main.h | 2 src/query/GNUmakefile | 26 + src/query/kmconfirm.sh | 4 src/query/kmmessage.sh | 4 src/query/kmquery.cpp | 268 +++++++++++++++ src/query/kmquery.h | 84 ++++ src/query/kmquery.pro | 23 + src/query/main.cpp | 270 +++++++++++++++ src/time/aboutdialog.ui | 2 26 files changed, 3530 insertions(+), 19 deletions(-) commit acf804ce733d860e7848ea27560061765a9f0a97 Author: Nathan Scott Date: Wed Jul 11 08:27:48 2007 +1000 Bump kmchart version to 0.7.0 commit 641d03117cbfd1d74705313caf83f09debc4fdd9 Author: Nathan Scott Date: Wed Jul 11 08:22:15 2007 +1000 Add kmquery(1), a Qt-based xconfirm/xmessage replacement (and more). From nscott@aconex.com Tue Jul 10 18:04:40 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 10 Jul 2007 18:04:45 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l6B14bbm004481 for ; Tue, 10 Jul 2007 18:04:39 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id B548C92C718; Wed, 11 Jul 2007 11:04:39 +1000 (EST) Subject: Re: PCP start/stop script regression From: Nathan Scott Reply-To: nscott@aconex.com To: Michael Newton Cc: Mark Goodwin , pcp@oss.sgi.com In-Reply-To: References: <1180062348.6273.575.camel@edge> <1180066020.6273.593.camel@edge> Content-Type: text/plain Organization: Aconex Date: Wed, 11 Jul 2007 11:04:09 +1000 Message-Id: <1184115849.15488.475.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1309 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp On Wed, 2007-07-04 at 17:15 +1000, Michael Newton wrote: > > mail, i see ivan asked a similar qn. We should probably pursue this > but it > means a change in linuxmeister/build/init_buildsystem -- which is > actually part of SuSE's build stuff. There is no current facility for > passing in --noscripts specifically, nor extra args in general. You may be able to override the definition of rpm itself (is it called via a macro?) - to be RPM="rpm --noscripts" ... hard to say without seeing the scripts though. > 2 possible stopgaps: > * restore the fallback to killall in oss, but not mangrove If that route were taken, theres no need for pmcd.pid anymore, it should all just be backed out. Not sure I'd favor that though, its painful to maintain a separate patch, and effectively reduces testing coverage from both sides of the mangrove/oss fence. > * have pmcd log creation of the pidfile. Only call killall if there is > a log file and it does *not* show creation of the pid file. That'd probably work, I guess. Just looking at that other patch you sent out now, which looks like a third option... (doesn't seem to do either of the above things afaict). cheers. -- Nathan From kimbrr@sgi.com Tue Jul 10 18:25:17 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 10 Jul 2007 18:25:22 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l6B1PEbm009570 for ; Tue, 10 Jul 2007 18:25:16 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA17384; Wed, 11 Jul 2007 11:25:09 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l6B1P7eW19524111; Wed, 11 Jul 2007 11:25:08 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l6B1P6Oo19420780; Wed, 11 Jul 2007 11:25:07 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Wed, 11 Jul 2007 11:25:06 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: Nathan Scott cc: Mark Goodwin , pcp@oss.sgi.com Subject: Re: PCP start/stop script regression In-Reply-To: <1184115849.15488.475.camel@edge.yarra.acx> Message-ID: References: <1180062348.6273.575.camel@edge> <1180066020.6273.593.camel@edge> <1184115849.15488.475.camel@edge.yarra.acx> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1310 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp On Wed, 11 Jul 2007, Nathan Scott wrote: > On Wed, 2007-07-04 at 17:15 +1000, Michael Newton wrote: > > > > mail, i see ivan asked a similar qn. We should probably pursue this > > but it > > means a change in linuxmeister/build/init_buildsystem -- which is > > actually part of SuSE's build stuff. There is no current facility for > > passing in --noscripts specifically, nor extra args in general. > > You may be able to override the definition of rpm itself (is it called > via a macro?) - to be RPM="rpm --noscripts" ... hard to say without > seeing the scripts though. no, no macro. I suppose its possible theres an alias being set-up at some intermediate level, but im not going looking for that.. hopefully whoever gets to look at the PV i raised will know > > 2 possible stopgaps: > > * restore the fallback to killall in oss, but not mangrove > > If that route were taken, theres no need for pmcd.pid anymore, it should > all just be backed out. Not sure I'd favor that though, its painful to > maintain a separate patch, and effectively reduces testing coverage from > both sides of the mangrove/oss fence. i agree > > * have pmcd log creation of the pidfile. Only call killall if there is > > a log file and it does *not* show creation of the pid file. > > That'd probably work, I guess. Just looking at that other patch you > sent out now, which looks like a third option... (doesn't seem to do > either of the above things afaict). no.. i started out to do the above, but it evolved:) From nscott@aconex.com Tue Jul 10 23:42:20 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 10 Jul 2007 23:42:24 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l6B6gIbm009926 for ; Tue, 10 Jul 2007 23:42:19 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 1578F92C2CC for ; Wed, 11 Jul 2007 16:42:21 +1000 (EST) Subject: pcp updates From: Nathan Scott Reply-To: nscott@aconex.com To: pcp@oss.sgi.com Content-Type: text/plain Organization: Aconex Date: Wed, 11 Jul 2007 16:41:51 +1000 Message-Id: <1184136111.15488.501.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1311 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Changes committed to git://oss.sgi.com:8090/nathans/pcp.git VERSION.pcp | 2 +- src/pmcd/rc_pcp | 10 ++++++++++ src/pmdas/windows/data.c | 10 ++++++++++ src/pmdas/windows/pmns.sqlserver | 2 ++ 4 files changed, 23 insertions(+), 1 deletion(-) commit d2ef88eeab3c5bcd5f796766c872076e4d10efb9 Author: Nathan Scott Date: Wed Jul 11 16:34:45 2007 +1000 Append current date string to build number commit 93357eccccd34b71882fcb98d0ab77d7fca138a8 Author: Nathan Scott Date: Wed Jul 11 16:32:53 2007 +1000 Add sqlserver active_transactions metrics to the Windows PMDA. commit d85a5d3e3703e20c8e42f2a6c26e946a65f7f77d Author: Michael Newton Date: Wed Jul 11 11:31:34 2007 +1000 Check if pmcd is running first in _shutdown, to get the status reporting right. From kimbrr@sgi.com Wed Jul 11 17:53:52 2007 Received: with ECARTIS (v1.0.0; list pcp); Wed, 11 Jul 2007 17:53:56 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l6C0rlbm009149 for ; Wed, 11 Jul 2007 17:53:49 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA04919 for ; Thu, 12 Jul 2007 10:53:49 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l6C0rmeW20741283 for ; Thu, 12 Jul 2007 10:53:49 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l6C0rlsr20682514 for ; Thu, 12 Jul 2007 10:53:48 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Thu, 12 Jul 2007 10:53:47 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: pcp@oss.sgi.com Subject: patch pcp stop to exit early when no pmcd running Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1312 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp Here is a patch to have /etc/init.d/pcp stop return early when there is no pmcd running Nathan has already reviewed, but will consider comments for future incorporation --- a/mgmt/pcp/src/pmcd/rc_pcp 2007-07-12 10:49:42.000000000 +1000 +++ b/mgmt/pcp/src/pmcd/rc_pcp 2007-07-12 10:47:28.050587091 +1000 @@ -372,6 +372,16 @@ _shutdown() { + # Is pmcd running? + # + _get_pids_by_name pmcd >$tmp.tmp + if [ ! -s $tmp.tmp ] + then + echo "$prog: PMCD not running" + rm -f $PCP_RUN_DIR/pmcd.pid + return 0 + fi + # Send pmcd a SIGTERM, which is noted as a pending shutdown. # When finished the currently active request, pmcd will close any # connections, wait for any agents, and then exit. Dr.Michael("Kimba")Newton kimbrr@sgi.com From kimbrr@sgi.com Wed Jul 11 18:28:15 2007 Received: with ECARTIS (v1.0.0; list pcp); Wed, 11 Jul 2007 18:28:21 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l6C1SCbm027486 for ; Wed, 11 Jul 2007 18:28:14 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA06068 for ; Thu, 12 Jul 2007 11:28:13 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l6C1SCeW20740601 for ; Thu, 12 Jul 2007 11:28:13 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l6C1SBqM20741421 for ; Thu, 12 Jul 2007 11:28:12 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Thu, 12 Jul 2007 11:28:11 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: pcp@oss.sgi.com Subject: patch: ensure the $PCP_RUN_DIR exists Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1313 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp For his patch Fix PCP start script regressions Nathan wrote : Also ensure the $PCP_RUN_DIR exists, else pmcd fails to start from this script on Windows at least. Im separatng this into a separate patch as i need to deal differently with the other issues --- a/mgmt/pcp/src/pmcd/rc_pcp 2007-07-12 11:21:55.000000000 +1000 +++ b/mgmt/pcp/src/pmcd/rc_pcp 2007-07-12 11:21:18.833873157 +1000 @@ -500,6 +500,7 @@ Error: PMCD control file '"$PCP_PMCDCONF_PATH"' is missing, cannot start PMCD.' exit fi + [ ! -d $PCP_RUN_DIR ] && mkdir -p $PCP_RUN_DIR [ ! -d $RUNDIR ] && mkdir -p $RUNDIR cd $RUNDIR Dr.Michael("Kimba")Newton kimbrr@sgi.com From jason.rappleye@gmail.com Mon Jul 16 08:39:29 2007 Received: with ECARTIS (v1.0.0; list pcp); Mon, 16 Jul 2007 17:03:11 -0700 (PDT) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.179]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l6GFdRbm030731 for ; Mon, 16 Jul 2007 08:39:29 -0700 Received: by wa-out-1112.google.com with SMTP id k22so1778671waf for ; Mon, 16 Jul 2007 08:39:30 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type; b=RmIbvO5wmmOvJ3PNxKR9VTJCuxJKBzJAdYqSNVwL+ddOKplBmz51CQYGos+PRMfFWDZLMPa1RtyTiu9dGFB01UI9UIaWD6FjMkbbXLWIk8s957+YYRsYadMNqpQEcVxyWRWo2854+dOWAO8ssWvgt1hGX6u9U2AFe3iadthDx+Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:mime-version:content-type; b=I3EthTSf1CIftUB+Do6gvrGwpicznSKBTpfr74Ti/ooBPrQD4tFBnMx4qNxTaeI2PFYbd7Wd2++isG1MZePxS/vmuwQ9JwJEJfuq1P0QUC3S/xiOIwg6kUbarpVtVdr9p40a6KD/ZXhBWYeq7w9Pfng3ZVLZ9umQAmCJgcQ1Muw= Received: by 10.115.75.1 with SMTP id c1mr4221328wal.1184598892020; Mon, 16 Jul 2007 08:14:52 -0700 (PDT) Received: by 10.115.72.10 with HTTP; Mon, 16 Jul 2007 08:14:51 -0700 (PDT) Message-ID: <6ccb8bf80707160814u3093b234l597ce029b5687fb6@mail.gmail.com> Date: Mon, 16 Jul 2007 11:14:51 -0400 From: "Jason Rappleye" To: pcp@oss.sgi.com Subject: Nathan Scott's kernel.all.pswitch patch MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_65160_27299252.1184598891995" X-archive-position: 1314 X-Approved-By: makc@sgi.com X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: jason.rappleye@gmail.com Precedence: bulk X-list: pcp ------=_Part_65160_27299252.1184598891995 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi, It would appear that Nathan Scott's kernel.all.pswitch patch http://oss.sgi.com/archives/pcp/2007-03/msg00007.html didn't make it into the 2.7.1 release. Would appreciate it if it made it into the next release. Thanks, Jason ------=_Part_65160_27299252.1184598891995 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi,

It would appear that Nathan Scott's kernel.all.pswitch patch 


didn't make it into the 2.7.1 release. Would appreciate it if it made it into the next release.

Thanks,

Jason

 
------=_Part_65160_27299252.1184598891995-- From kimbrr@sgi.com Mon Jul 16 17:36:08 2007 Received: with ECARTIS (v1.0.0; list pcp); Mon, 16 Jul 2007 17:36:13 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l6H0a5bm022719 for ; Mon, 16 Jul 2007 17:36:07 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA07498; Tue, 17 Jul 2007 10:36:02 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l6H0a0eW26671901; Tue, 17 Jul 2007 10:36:01 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l6H0ZwaN26593879; Tue, 17 Jul 2007 10:35:59 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Tue, 17 Jul 2007 10:35:58 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: Jason Rappleye cc: pcp@oss.sgi.com Subject: Re: Nathan Scott's kernel.all.pswitch patch In-Reply-To: <6ccb8bf80707160814u3093b234l597ce029b5687fb6@mail.gmail.com> Message-ID: References: <6ccb8bf80707160814u3093b234l597ce029b5687fb6@mail.gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1315 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp On Mon, 16 Jul 2007, Jason Rappleye wrote: > It would appear that Nathan Scott's kernel.all.pswitch patch > > http://oss.sgi.com/archives/pcp/2007-03/msg00007.html > > didn't make it into the 2.7.1 release. Would appreciate it if it made it > into the next release. sure. Theres a queue of stuff which i hope to get in soon. Its mostly been slowed down by getting a solution for the problem of restart not working after an upgrade, that would also work for our chroot builds. I hope to get back to this RSN Dr.Michael("Kimba")Newton kimbrr@sgi.com From kimbrr@sgi.com Mon Jul 16 19:43:57 2007 Received: with ECARTIS (v1.0.0; list pcp); Mon, 16 Jul 2007 19:44:01 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l6H2hrbm022678 for ; Mon, 16 Jul 2007 19:43:55 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA11855 for ; Tue, 17 Jul 2007 12:43:55 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l6H2hteW26779300 for ; Tue, 17 Jul 2007 12:43:55 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l6H2hsko26732310 for ; Tue, 17 Jul 2007 12:43:55 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Tue, 17 Jul 2007 12:43:54 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: pcp@oss.sgi.com Subject: Patch to allow upgraded pcp to stop old pmcd Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1316 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp Nathan has previously pointed out that if you install the 2.7.1 over an existing installation (eg rpm -U), the new /etc/init.d/pcp cannot stop a pre-existing instance of pmcd. This was an oversight with the pidfile change. While Nathan proposed just adding a fallback to killall for this case, this was a problem for us as we have a build system (based on SuSE's) for building in a chroot, and the killall means removing the rpm from the chroot stops pcp for the whole machine. Although it would be preferable to uninstall --noscripts, getting this change through SuSE could take time, so for now, this patch assumes that if there is a pmcd, but no pidfile or log file, we're in a chroot. In addition, there is now a check that the pid in the pidfile matches that found for pmcd --- a/mgmt/pcp/src/pmcd/rc_pcp 2007-07-17 12:30:43.000000000 +1000 +++ b/mgmt/pcp/src/pmcd/rc_pcp 2007-07-12 11:59:39.045688455 +1000 @@ -382,14 +382,44 @@ return 0 fi + # if pmcd is running but we can't find a pidfile, or a logfile at the + # configured or default location, assume chroot + # + logf=`_pmcd_logfile` + [ -f $logf ] || logf=$RUNDIR/pmcd.log + if [ ! -f $PCP_RUN_DIR/pmcd.pid -a ! -f $logf ] + then + echo "Process ..." + cat $tmp.tmp + echo "$prog: +Warning: found no $PCP_RUN_DIR/pmcd.pid + and no $logf. + Assuming an uninstall from a chroot: PMCD not killed. + If this is incorrect, kill -TERM can be applied to the above PID." + exit + # Send pmcd a SIGTERM, which is noted as a pending shutdown. # When finished the currently active request, pmcd will close any # connections, wait for any agents, and then exit. # - if [ -f $PCP_RUN_DIR/pmcd.pid ] + elif [ -f $PCP_RUN_DIR/pmcd.pid ] then - kill -TERM `cat $PCP_RUN_DIR/pmcd.pid` - rm -f $PCP_RUN_DIR/pmcd.pid + TOKILL=`cat $PCP_RUN_DIR/pmcd.pid` + if grep "^$TOKILL$" $tmp.tmp >/dev/null + then + kill -TERM $TOKILL >/dev/null 2>&1 + rm -f $PCP_RUN_DIR/pmcd.pid + else + echo "Process ..." + cat $tmp.tmp + echo "$prog: +Warning: process ID in $PCP_RUN_DIR/pmcd.pid is $TOKILL. + Check logfile $logf. When you are ready to proceed, remove + $PCP_RUN_DIR/pmcd.pid before retrying." + exit + fi + else + $PCP_KILLALL_PROG -TERM pmcd > /dev/null 2>&1 fi $ECHO $PCP_ECHO_N "Waiting for PMCD to terminate ...""$PCP_ECHO_C" delay=200 # tenths of a second @@ -471,7 +501,7 @@ 'start'|'restart') _get_pids_by_name pmcd >$tmp.tmp - [ -s $tmp.tmp ] && _shutdown + [ -f $PCP_RUN_DIR/pmcd.pid -o -s $tmp.tmp ] && _shutdown # PMCD and PMDA messages should go to stderr, not the GUI notifiers # Dr.Michael("Kimba")Newton kimbrr@sgi.com From nscott@aconex.com Tue Jul 17 15:24:57 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 17 Jul 2007 15:25:04 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l6HMOtbm005123 for ; Tue, 17 Jul 2007 15:24:57 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 89459127C44E for ; Wed, 18 Jul 2007 08:24:57 +1000 (EST) Subject: kmchart updates From: Nathan Scott Reply-To: nscott@aconex.com To: pcp@oss.sgi.com Content-Type: text/plain Organization: Aconex Date: Wed, 18 Jul 2007 08:24:44 +1000 Message-Id: <1184711084.15488.535.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1317 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Changes committed to git://oss.sgi.com:8090/nathans/kmchart.git man/man1/kmquery.1 | 13 ++++++++++++- src/query/kmconfirm.sh | 2 +- src/query/kmmessage.sh | 2 +- src/query/kmquery.cpp | 39 ++++++++++++++++++++++++++------------- 4 files changed, 40 insertions(+), 16 deletions(-) commit 5074ad33535d0a6a5708c67e3d18629eeb5644e4 Author: Nathan Scott Date: Wed Jul 18 08:06:24 2007 +1000 Improve on/off/auto-scrolling of the text in kmquery; add a man page example. commit 02a40453a28dac275ba2eb4ad881f36a9bfb076a Author: Nathan Scott Date: Fri Jul 13 07:42:30 2007 +1000 Pass kmquery command line arguments through without any sh argument frobbing. From nscott@aconex.com Tue Jul 17 17:51:28 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 17 Jul 2007 17:51:33 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l6I0pPbm008546 for ; Tue, 17 Jul 2007 17:51:27 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id B2792127C46B; Wed, 18 Jul 2007 10:51:28 +1000 (EST) Subject: Re: Patch to allow upgraded pcp to stop old pmcd From: Nathan Scott Reply-To: nscott@aconex.com To: Michael Newton Cc: pcp@oss.sgi.com In-Reply-To: References: Content-Type: multipart/mixed; boundary="=-peQ4iBRtbgZoImaUHBQG" Organization: Aconex Date: Wed, 18 Jul 2007 10:51:15 +1000 Message-Id: <1184719875.15488.545.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 X-archive-position: 1318 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp --=-peQ4iBRtbgZoImaUHBQG Content-Type: text/plain Content-Transfer-Encoding: 7bit On Tue, 2007-07-17 at 12:43 +1000, Michael Newton wrote: > Nathan has previously pointed out that if you install the 2.7.1 > over an existing installation (eg rpm -U), the new /etc/init.d/pcp > cannot stop a pre-existing instance of pmcd. This was an oversight with > the pidfile change. While Nathan proposed just adding a fallback to > killall for this case, this was a problem for us as we have a build system > (based on SuSE's) for building in a chroot, and the killall means removing > the rpm from the chroot stops pcp for the whole machine. Although it would > be preferable to uninstall --noscripts, getting this change through SuSE > could take time, so for now, this patch assumes that if there is a pmcd, > but no pidfile or log file, we're in a chroot. In addition, there is now > a check that the pid in the pidfile matches that found for pmcd Yeah. What a sad and sorry tale. :) > > --- a/mgmt/pcp/src/pmcd/rc_pcp 2007-07-17 12:30:43.000000000 +1000 > +++ b/mgmt/pcp/src/pmcd/rc_pcp 2007-07-12 11:59:39.045688455 +1000 > @@ -382,14 +382,44 @@ > return 0 > fi > > + # if pmcd is running but we can't find a pidfile, or a logfile at the > + # configured or default location, assume chroot > + # > + logf=`_pmcd_logfile` > + [ -f $logf ] || logf=$RUNDIR/pmcd.log > + if [ ! -f $PCP_RUN_DIR/pmcd.pid -a ! -f $logf ] > + then > + echo "Process ..." > + cat $tmp.tmp The pid should probably be printed on the same line, I'll send a followup patch to tidy that up. > @@ -471,7 +501,7 @@ > > 'start'|'restart') > _get_pids_by_name pmcd >$tmp.tmp > - [ -s $tmp.tmp ] && _shutdown > + [ -f $PCP_RUN_DIR/pmcd.pid -o -s $tmp.tmp ] && _shutdown Hmm - in the case where pmcd is running, we now end up calling _get_pids_by_name twice (since _shutdown does it too now)... we can really just call "_shutdown" here now, always, I think ... but, thats independent cleanup as well. I've included that in the followup patch (attached). cheers. -- Nathan --=-peQ4iBRtbgZoImaUHBQG Content-Disposition: attachment; filename=diff Content-Type: text/x-patch; name=diff; charset=utf-8 Content-Transfer-Encoding: 7bit diff --git a/src/pmcd/rc_pcp b/src/pmcd/rc_pcp index a00297b..70bb360 100644 --- a/src/pmcd/rc_pcp +++ b/src/pmcd/rc_pcp @@ -394,20 +394,20 @@ _shutdown() _get_pids_by_name pmcd >$tmp.tmp if [ ! -s $tmp.tmp ] then - echo "$prog: PMCD not running" + [ "$1" = verbose ] && echo "$prog: PMCD not running" rm -f $PCP_RUN_DIR/pmcd.pid return 0 fi - # if pmcd is running but we can't find a pidfile, or a logfile at the - # configured or default location, assume chroot + # If pmcd is running but we can't find a pidfile, or a logfile at the + # configured or default location, assume this script is being run via + # a chroot build environment (and hence we do not want to kill pmcd). # logf=`_pmcd_logfile` [ -f $logf ] || logf=$RUNDIR/pmcd.log if [ ! -f $PCP_RUN_DIR/pmcd.pid -a ! -f $logf ] then - echo "Process ..." - cat $tmp.tmp + $ECHO $PCP_ECHO_N "PMCD process ... "`cat $tmp.tmp` echo "$prog: Warning: found no $PCP_RUN_DIR/pmcd.pid and no $logf. @@ -427,8 +427,7 @@ Warning: found no $PCP_RUN_DIR/pmcd.pid kill -TERM $TOKILL >/dev/null 2>&1 rm -f $PCP_RUN_DIR/pmcd.pid else - echo "Process ..." - cat $tmp.tmp + $ECHO $PCP_ECHO_N "PMCD process ... "`cat $tmp.tmp` echo "$prog: Warning: process ID in $PCP_RUN_DIR/pmcd.pid is $TOKILL. Check logfile $logf. When you are ready to proceed, remove @@ -450,8 +449,7 @@ Warning: process ID in $PCP_RUN_DIR/pmcd.pid is $TOKILL. done if [ $delay -eq 0 ] # It just WON'T DIE, give up. then - echo "Process ..." - cat $tmp.tmp + $ECHO $PCP_ECHO_N "PMCD process ... "`cat $tmp.tmp` echo "$prog: Warning: PMCD won't die!" exit fi @@ -517,8 +515,7 @@ $RC_RESET case "$1" in 'start'|'restart') - _get_pids_by_name pmcd >$tmp.tmp - [ -f $PCP_RUN_DIR/pmcd.pid -o -s $tmp.tmp ] && _shutdown + _shutdown quietly # PMCD and PMDA messages should go to stderr, not the GUI notifiers # @@ -606,7 +603,7 @@ Error: PMCD control file '"$PCP_PMCDCONF_PATH"' is missing, cannot start PMCD.' # site-local customisations before PCP shutdown # [ -x $PCPLOCAL ] && $PCPLOCAL $VFLAG stop - _shutdown + _shutdown verbose status=0 ;; --=-peQ4iBRtbgZoImaUHBQG-- From nscott@aconex.com Tue Jul 17 17:59:16 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 17 Jul 2007 17:59:20 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l6I0xFbm011800 for ; Tue, 17 Jul 2007 17:59:16 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id D48D7127C46E for ; Wed, 18 Jul 2007 10:59:17 +1000 (EST) Subject: pcp updates From: Nathan Scott Reply-To: nscott@aconex.com To: pcp@oss.sgi.com Content-Type: text/plain Organization: Aconex Date: Wed, 18 Jul 2007 10:59:04 +1000 Message-Id: <1184720344.15488.547.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1319 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Changes committed to git://oss.sgi.com:8090/nathans/pcp.git VERSION.pcp | 2 - src/pmcd/rc_pcp | 57 +++++++++++++++++++++++++++++++++-------------- src/pmdas/windows/data.c | 2 - 3 files changed, 43 insertions(+), 18 deletions(-) commit 6b771b5558a85ef98c39f40ada98b0a20dc00350 Author: Nathan Scott Date: Wed Jul 18 10:54:00 2007 +1000 Append current date string to build number commit 4b4bed0caca266bd44d0e4a84dfb53c7425e709d Author: Nathan Scott Date: Wed Jul 18 10:53:14 2007 +1000 Minor cleanups to the kill-pmcd-from-chroot-build fix. commit f2e89c0b5b1bdc4d7fcbb35fee0bda37edb3fc23 Author: Michael Newton Date: Wed Jul 18 10:38:03 2007 +1000 Patch to allow upgraded pcp to stop old pmcd Nathan has previously pointed out that if you install the 2.7.1 over an existing installation (eg rpm -U), the new /etc/init.d/pcp cannot stop a pre-existing instance of pmcd. This was an oversight with the pidfile change. While Nathan proposed just adding a fallback to killall for this case, this was a problem for us as we have a build system (based on SuSE's) for building in a chroot, and the killall means removing the rpm from the chroot stops pcp for the whole machine. Although it would be preferable to uninstall --noscripts, getting this change through SuSE could take time, so for now, this patch assumes that if there is a pmcd, but no pidfile or log file, we're in a chroot. In addition, there is now a check that the pid in the pidfile matches that found for pmcd. commit 496deaf54f4adeea111c71dbf3886c82897b1e63 Author: Nathan Scott Date: Fri Jul 13 10:55:05 2007 +1000 Correct the pmunits for sqlserver.buf_mgr.page_life_expectancy. From kimbrr@sgi.com Tue Jul 17 18:20:34 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 17 Jul 2007 18:20:39 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l6I1KUbm020687 for ; Tue, 17 Jul 2007 18:20:33 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA23610; Wed, 18 Jul 2007 11:20:29 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l6I1KSeW27955686; Wed, 18 Jul 2007 11:20:29 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l6I1KRmB27889026; Wed, 18 Jul 2007 11:20:28 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Wed, 18 Jul 2007 11:20:27 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: Nathan Scott cc: pcp@oss.sgi.com Subject: Re: Patch to allow upgraded pcp to stop old pmcd In-Reply-To: <1184719875.15488.545.camel@edge.yarra.acx> Message-ID: References: <1184719875.15488.545.camel@edge.yarra.acx> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1320 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp nathan: >> + # if pmcd is running but we can't find a pidfile, or a logfile at the >> + # configured or default location, assume chroot >> + # >> + logf=`_pmcd_logfile` >> + [ -f $logf ] || logf=$RUNDIR/pmcd.log >> + if [ ! -f $PCP_RUN_DIR/pmcd.pid -a ! -f $logf ] >> + then >> + echo "Process ..." >> + cat $tmp.tmp >The pid should probably be printed on the same line, I'll send a >followup patch to tidy that up. If you like.. i was merely following the pre-existing style used for the message when pmcd would not die. I think it is used in several places now, so you better make sure you get all of them eggs, granny ;) Dr.Michael("Kimba")Newton kimbrr@sgi.com From nscott@aconex.com Wed Jul 18 16:28:00 2007 Received: with ECARTIS (v1.0.0; list pcp); Wed, 18 Jul 2007 16:28:05 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l6INRxbm020500 for ; Wed, 18 Jul 2007 16:28:00 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id D2005127C519 for ; Thu, 19 Jul 2007 09:28:00 +1000 (EST) Subject: pcp updates From: Nathan Scott Reply-To: nscott@aconex.com To: pcp@oss.sgi.com Content-Type: text/plain Organization: Aconex Date: Thu, 19 Jul 2007 09:27:49 +1000 Message-Id: <1184801270.16678.20.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1321 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Changes committed to git://oss.sgi.com:8090/nathans/pcp.git src/pmdumptext/GNUmakefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) commit 2ff3cbef54679ab21c53d7192c02ae84ad7cf0d0 Author: Nathan Scott Date: Thu Jul 19 09:26:03 2007 +1000 Fix include path for pmdumptext. From nscott@aconex.com Sun Jul 22 16:55:31 2007 Received: with ECARTIS (v1.0.0; list pcp); Sun, 22 Jul 2007 16:55:35 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l6MNtTbm012426 for ; Sun, 22 Jul 2007 16:55:31 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 0BCC0127C457 for ; Mon, 23 Jul 2007 09:55:31 +1000 (EST) Subject: kmchart updates From: Nathan Scott Reply-To: nscott@aconex.com To: pcp@oss.sgi.com Content-Type: text/plain Organization: Aconex Date: Mon, 23 Jul 2007 09:55:30 +1000 Message-Id: <1185148530.10702.0.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1322 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Changes committed to git://oss.sgi.com:8090/nathans/kmchart.git dev/null |binary images/document-export.png |binary images/document-export.svg | 522 +++++++++++++++++++++++++++++++++++++ images/document-save-as.svg | 590 ------------------------------------------ src/chart/exportdialog.ui | 371 ++++++++++++++++++++++++++ src/chart/exportdialog.ui.h | 107 +++++++ src/chart/kmchart.pro | 9 src/chart/kmchart.ui | 38 ++ src/chart/kmchart.ui.h | 18 + src/chart/main.cpp | 18 - src/chart/main.h | 3 src/chart/recorddialog.ui | 606 +++++++++++++++++++++++--------------------- src/chart/recorddialog.ui.h | 47 ++- src/chart/view.cpp | 6 src/chart/view.h | 7 15 files changed, 1420 insertions(+), 922 deletions(-) commit 4aedf4f5b9491af7eb62da8699ac96cf19d8f110 Author: Nathan Scott Date: Mon Jul 23 09:37:07 2007 +1000 Add an Export (to bitmap) File option. Its in a similar state to Print atm... more work needed. From nscott@aconex.com Sun Jul 22 16:57:56 2007 Received: with ECARTIS (v1.0.0; list pcp); Sun, 22 Jul 2007 16:58:01 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l6MNvtbm013580 for ; Sun, 22 Jul 2007 16:57:56 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id D65C5127C690 for ; Mon, 23 Jul 2007 09:57:58 +1000 (EST) Subject: pcp updates From: Nathan Scott Reply-To: nscott@aconex.com To: pcp@oss.sgi.com Content-Type: text/plain Organization: Aconex Date: Mon, 23 Jul 2007 09:57:58 +1000 Message-Id: <1185148678.10702.2.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1323 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Changes committed to git://oss.sgi.com:8090/nathans/pcp.git VERSION.pcp | 2 configure | 9996 ++++++++++++++++++++--------------------- configure.in | 8 src/include/platform_defs.h.in | 6 src/libpcp/src/util.c | 8 src/pmdas/linux/proc_pid.c | 11 src/pmlogger/pmlogger.c | 9 7 files changed, 5129 insertions(+), 4911 deletions(-) commit 45da7fc4a2463279b716351117c8f903daff03f7 Merge: 8ab91d3... 903e95b... Author: Nathan Scott Date: Mon Jul 23 09:50:01 2007 +1000 Merge leaf:/source/git/pcp into nathans commit 8ab91d31d4656012c4ded4688e1d4f7346487b31 Author: Nathan Scott Date: Mon Jul 23 09:46:16 2007 +1000 Remove redundant init-to-zero of fields in a global variable - minor cleanup. commit 903e95b800ef9004144d5815ed682292f3a6288f Author: Nathan Scott Date: Mon Jul 23 09:40:11 2007 +1000 Open temp files exclusively, avoiding potential symlink vulnerabilities. commit 4a58cde5b53fe2a1b218618f8ec951b9a43f22d7 Author: Nathan Scott Date: Mon Jul 23 07:43:38 2007 +1000 Update checked in configure script, picking up Mac and Debian-autoconf fixes commit 2a762c03137c5929b6f6de222961dcb5dd3142a7 Author: Nathan Scott Date: Mon Jul 23 07:40:55 2007 +1000 Fix MacOSX endian detection for all Intel x86_64 based Macs. commit e560672032823cbba1062ec5982e55b9b952db82 Author: Nathan Scott Date: Thu Jul 19 14:54:12 2007 +1000 Append current date string to build number From nscott@aconex.com Mon Jul 23 16:06:36 2007 Received: with ECARTIS (v1.0.0; list pcp); Mon, 23 Jul 2007 16:06:41 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l6NN6Xbm024489 for ; Mon, 23 Jul 2007 16:06:35 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 6B058123C171 for ; Tue, 24 Jul 2007 09:06:36 +1000 (EST) Subject: kmchart updates From: Nathan Scott Reply-To: nscott@aconex.com To: pcp@oss.sgi.com Content-Type: text/plain Organization: Aconex Date: Tue, 24 Jul 2007 09:06:38 +1000 Message-Id: <1185231998.10702.37.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1324 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Changes committed to git://oss.sgi.com:8090/nathans/kmchart.git src/chart/main.cpp | 9 +++++++++ src/time/main.cpp | 9 +++++++++ 2 files changed, 18 insertions(+) commit 51a055501a3ede9de37090cd6e654016eb6213be Author: Nathan Scott Date: Tue Jul 24 08:32:55 2007 +1000 Use kmquery as the target for all pmprintf strings in kmchart/kmtime. From nscott@aconex.com Mon Jul 23 16:08:43 2007 Received: with ECARTIS (v1.0.0; list pcp); Mon, 23 Jul 2007 16:08:47 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l6NN8fbm025002 for ; Mon, 23 Jul 2007 16:08:42 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 3600F123C171 for ; Tue, 24 Jul 2007 09:08:45 +1000 (EST) Subject: pcp updates From: Nathan Scott Reply-To: nscott@aconex.com To: pcp@oss.sgi.com Content-Type: text/plain Organization: Aconex Date: Tue, 24 Jul 2007 09:08:47 +1000 Message-Id: <1185232127.10702.39.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1325 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Changes committed to git://oss.sgi.com:8090/nathans/pcp.git README | 2 +- build/mac/installer-resources/ReadMe.html | 2 +- src/libpcp/src/util.c | 4 ++-- src/pmlogger/pmlogger.c | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) commit 62bb0a32228b8822108568e9bdb54851cc437ec1 Merge: 4bd690d... 3e85d29... Author: Nathan Scott Date: Tue Jul 24 09:03:28 2007 +1000 Merge leaf:/source/git/pcp into nathans commit 3e85d291af44702c2297fb71fed8e141f22af86a Author: Nathan Scott Date: Tue Jul 24 08:34:44 2007 +1000 Must use create flag on open for temp files now; resolve a+ mode vs fdopen. commit 4bd690d6c8815cdafe9624d41b90eeec9665571b Author: Nathan Scott Date: Mon Jul 23 16:35:39 2007 +1000 Fix URLs refering to relocated pages on www.sgi.com. From nscott@aconex.com Mon Jul 23 16:23:11 2007 Received: with ECARTIS (v1.0.0; list pcp); Mon, 23 Jul 2007 16:23:17 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l6NNNAbm032321 for ; Mon, 23 Jul 2007 16:23:11 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id B2376123C532; Tue, 24 Jul 2007 09:23:13 +1000 (EST) Subject: Re: Nathan Scott's kernel.all.pswitch patch From: Nathan Scott Reply-To: nscott@aconex.com To: Michael Newton , Jason Rappleye Cc: pcp@oss.sgi.com In-Reply-To: References: <6ccb8bf80707160814u3093b234l597ce029b5687fb6@mail.gmail.com> Content-Type: text/plain Organization: Aconex Date: Tue, 24 Jul 2007 09:23:15 +1000 Message-Id: <1185232996.10702.48.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1326 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp On Tue, 2007-07-17 at 10:35 +1000, Michael Newton wrote: > On Mon, 16 Jul 2007, Jason Rappleye wrote: > > It would appear that Nathan Scott's kernel.all.pswitch patch > > > > http://oss.sgi.com/archives/pcp/2007-03/msg00007.html > > > > didn't make it into the 2.7.1 release. Would appreciate it if it > made it > > into the next release. > > sure. Theres a queue of stuff which i hope to get in soon. ... In the meantime, Jason, you can get the PCP packages that we use on our production systems from here: http://oss.sgi.com/~nathans/ This includes that fix and numerous others too. Its updated relatively frequently compared to base PCP, so additional testers are very welcome and will help catch new problems before they propogate into SGI's PCP. cheers. -- Nathan From kimbrr@sgi.com Mon Jul 23 18:32:23 2007 Received: with ECARTIS (v1.0.0; list pcp); Mon, 23 Jul 2007 18:32:28 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l6O1WHbm030350 for ; Mon, 23 Jul 2007 18:32:21 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA06152 for ; Tue, 24 Jul 2007 11:32:19 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l6O1WIeW34767036 for ; Tue, 24 Jul 2007 11:32:19 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l6O1WHaw34772077 for ; Tue, 24 Jul 2007 11:32:18 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Tue, 24 Jul 2007 11:32:17 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: pcp@oss.sgi.com Subject: pidfile change breaks qa test which run non-root pmcd Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1327 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp Several QA tests run pmcd -f & With the pidfile change, they break because they dont have write permission on the pidfile. Here's a fix =========================================================================== mgmt/pcp/qa/023 =========================================================================== --- a/mgmt/pcp/qa/023 2007-07-24 11:15:59.000000000 +1000 +++ b/mgmt/pcp/qa/023 2007-07-24 10:44:45.946768948 +1000 @@ -82,6 +82,7 @@ SAVE_CONFIG=$tmp/pmcd.conf.save LOGCONTROL=$PCP_VAR_DIR/config/pmlogger/control SAVE_LOGCONTROL=$tmp/control.save +PIDFILE=$PCP_RUN_DIR/pmcd.pid here=`pwd` sudo=$here/sudo _needclean=true @@ -100,8 +101,7 @@ if $_needclean then _needclean=false - $sudo "killall -TERM pmcd" - + $sudo $PCP_RC_DIR/pcp stop | _filter_pcp_stop [ -f $SAVE_CONFIG ] && $sudo mv $SAVE_CONFIG $CONFIG [ -f $SAVE_LOGCONTROL ] && $sudo mv $SAVE_LOGCONTROL $LOGCONTROL [ ! -z "$SAVE_LOGGER" ] && _change_config pmlogger $SAVE_LOGGER @@ -188,6 +188,8 @@ # Note: start pmcd with -f so that its PID stays the same (no daemon) # +$sudo touch $PIDFILE +$sudo chmod a+w $PIDFILE PMCD_PORT=$port PATH=$PATH:$here/src-oss export PMCD_PORT PATH =========================================================================== mgmt/pcp/qa/023.out.1 =========================================================================== --- a/mgmt/pcp/qa/023.out.1 2007-07-24 11:15:59.000000000 +1000 +++ b/mgmt/pcp/qa/023.out.1 2007-07-23 17:35:12.505588863 +1000 @@ -146,6 +146,7 @@ inst [5 or "fake_cisco"] value 0 inst [6 or "fake_six"] value 0 +Waiting for PMCD to terminate ... Restart and ping pmcd ... Performance Co-Pilot starting PMCD (logfile is $PCP_LOG_DIR/pmcd.log) ... Performance Co-Pilot starting archive loggers ... =========================================================================== mgmt/pcp/qa/023.out.2 =========================================================================== --- a/mgmt/pcp/qa/023.out.2 2007-07-24 11:15:59.000000000 +1000 +++ b/mgmt/pcp/qa/023.out.2 2007-07-23 17:35:15.413211242 +1000 @@ -146,6 +146,7 @@ inst [5 or "fake_cisco"] value 0 inst [6 or "fake_six"] value 0 +Waiting for PMCD to terminate ... Restart and ping pmcd ... Performance Co-Pilot starting PMCD (logfile is $PCP_LOG_DIR/pmcd.log) ... Performance Co-Pilot starting archive loggers ... =========================================================================== mgmt/pcp/qa/051 =========================================================================== --- a/mgmt/pcp/qa/051 2007-07-24 11:15:59.000000000 +1000 +++ b/mgmt/pcp/qa/051 2007-07-24 10:48:52.662690141 +1000 @@ -36,6 +36,7 @@ oconfig=$config.O log=./pmcd.log me=`hostname` +pidfile=$PCP_RUN_DIR/pmcd.pid rm -f $here/$seq.full @@ -84,6 +85,7 @@ cleanup() { + $sudo rm -f $pidfile if [ -f $oconfig ] then $sudo "mv $oconfig $config" @@ -119,6 +121,8 @@ exit 1 fi +$sudo chmod a+w $pidfile + echo "terminating pmcd..." $sudo $PCP_RC_DIR/pcp stop | _filter_pcp_stop if [ -f $config ] =========================================================================== mgmt/pcp/qa/243 =========================================================================== --- a/mgmt/pcp/qa/243 2007-07-24 11:15:59.000000000 +1000 +++ b/mgmt/pcp/qa/243 2007-07-24 10:56:05.858359093 +1000 @@ -71,6 +71,7 @@ tmp=/var/tmp/$$ here=`pwd` sudo=$here/sudo +pidfile=$PCP_RUN_DIR/pmcd.pid _needclean=true rm -rf $tmp @@ -84,7 +85,6 @@ if $_needclean then _needclean=false - $sudo "killall -TERM pmcd" echo "Restart and ping pmcd ..." $sudo $PCP_RC_DIR/pcp start | _filter_pcp_start _wait_for_pmcd @@ -102,6 +102,8 @@ # Note: start pmcd with -f so that its PID stays the same (no daemon) # +$sudo touch $pidfile +$sudo chmod a+w $pidfile $PCP_PMCD_PROG -f -x err1 & _wait_for_pmcd =========================================================================== mgmt/pcp/qa/243.out.1 =========================================================================== --- a/mgmt/pcp/qa/243.out.1 2007-07-24 11:15:59.000000000 +1000 +++ b/mgmt/pcp/qa/243.out.1 2007-07-24 11:02:17.418092178 +1000 @@ -12,6 +12,7 @@ ... boring stuff deleted Checking that log hasn't changed ... Restart and ping pmcd ... +Waiting for PMCD to terminate ... Performance Co-Pilot starting PMCD (logfile is $PCP_LOG_DIR/pmcd.log) ... Performance Co-Pilot starting archive loggers ... pmcd.control.debug 1 =========================================================================== mgmt/pcp/qa/243.out.2 =========================================================================== --- a/mgmt/pcp/qa/243.out.2 2007-07-24 11:15:59.000000000 +1000 +++ b/mgmt/pcp/qa/243.out.2 2007-07-24 11:01:25.656816784 +1000 @@ -16,6 +16,7 @@ ok FD 4321 0x00000000 INADDR_ANY Checking that log hasn't changed ... Restart and ping pmcd ... +Waiting for PMCD to terminate ... Performance Co-Pilot starting PMCD (logfile is $PCP_LOG_DIR/pmcd.log) ... Performance Co-Pilot starting archive loggers ... pmcd.control.debug 1 =========================================================================== mgmt/pcp/qa/244 =========================================================================== --- a/mgmt/pcp/qa/244 2007-07-24 11:15:59.000000000 +1000 +++ b/mgmt/pcp/qa/244 2007-07-24 10:50:52.979045148 +1000 @@ -90,6 +90,7 @@ CONFIGSAVE=$tmp/pmcd.conf.save LOGCONTROL=$PCP_VAR_DIR/config/pmlogger/control SAVE_LOGCONTROL=$tmp/control.save +PIDFILE=$PCP_RUN_DIR/pmcd.pid here=`pwd` sudo=$here/sudo _needclean=true @@ -113,9 +114,8 @@ if $_needclean then _needclean=false - $sudo killall -TERM pmcd - $sudo rm -f $CONFIG - $sudo cp $CONFIGSAVE $CONFIG + $sudo $PCP_RC_DIR/pcp stop | _filter_pcp_stop + [ -f $CONFIGSAVE ] && $sudo mv $CONFIGSAVE $CONFIG $sudo chmod u-w $CONFIG [ -f $SAVE_LOGCONTROL ] && $sudo mv $SAVE_LOGCONTROL $LOGCONTROL _restore_loggers @@ -171,6 +171,8 @@ # Note: start pmcd with -f so that its PID stays the same (no daemon) # Sleep briefly to allow it time to start # +$sudo touch $PIDFILE +$sudo chmod a+w $PIDFILE PATH=$here/src-oss:$PATH export PATH $PCP_PMCD_PROG -f -t 2 & =========================================================================== mgmt/pcp/qa/244.out.1 =========================================================================== --- a/mgmt/pcp/qa/244.out.1 2007-07-24 11:15:59.000000000 +1000 +++ b/mgmt/pcp/qa/244.out.1 2007-07-23 18:13:30.377635303 +1000 @@ -103,6 +103,7 @@ Cleanup "fake_linux" agent (dom 60): unconfigured ... +Waiting for PMCD to terminate ... Restart and ping pmcd ... Performance Co-Pilot starting PMCD (logfile is $PCP_LOG_DIR/pmcd.log) ... Performance Co-Pilot starting archive loggers ... =========================================================================== mgmt/pcp/qa/244.out.2 =========================================================================== --- a/mgmt/pcp/qa/244.out.2 2007-07-24 11:15:59.000000000 +1000 +++ b/mgmt/pcp/qa/244.out.2 2007-07-23 18:13:32.673335406 +1000 @@ -103,6 +103,7 @@ Cleanup "fake_linux" agent (dom 60): unconfigured ... +Waiting for PMCD to terminate ... Restart and ping pmcd ... Performance Co-Pilot starting PMCD (logfile is $PCP_LOG_DIR/pmcd.log) ... Performance Co-Pilot starting archive loggers ... =========================================================================== mgmt/pcp/qa/254 =========================================================================== --- a/mgmt/pcp/qa/254 2007-07-24 11:15:59.000000000 +1000 +++ b/mgmt/pcp/qa/254 2007-07-24 11:08:18.703150175 +1000 @@ -18,6 +18,7 @@ sudo=$here/sudo _needclean=true pmns="nameall.pmns" +pidfile=$PCP_RUN_DIR/pmcd.pid rm -rf $tmp mkdir $tmp @@ -30,7 +31,6 @@ if $_needclean then _needclean=false - $sudo "killall -TERM pmcd" echo "Restart and ping pmcd ..." $sudo $PCP_RC_DIR/pcp start | _filter_pcp_start _wait_for_pmcd @@ -67,6 +67,8 @@ # Note: start pmcd with -f so that its PID stays the same (no daemon) # +$sudo touch $pidfile +$sudo chmod a+w $pidfile $PCP_PMCD_PROG -f -n $pmns & _wait_for_pmcd =========================================================================== mgmt/pcp/qa/254.out =========================================================================== --- a/mgmt/pcp/qa/254.out 2007-07-24 11:15:59.000000000 +1000 +++ b/mgmt/pcp/qa/254.out 2007-07-24 11:11:52.643348569 +1000 @@ -18,5 +18,6 @@ 29.0.11 alias yet.again and another_ten 29.0.11 alias yet.again and ten Restart and ping pmcd ... +Waiting for PMCD to terminate ... Performance Co-Pilot starting PMCD (logfile is $PCP_LOG_DIR/pmcd.log) ... Performance Co-Pilot starting archive loggers ... Dr.Michael("Kimba")Newton kimbrr@sgi.com From nscott@aconex.com Mon Jul 23 20:13:52 2007 Received: with ECARTIS (v1.0.0; list pcp); Mon, 23 Jul 2007 20:13:59 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l6O3Dnbm019411 for ; Mon, 23 Jul 2007 20:13:52 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 7819F12FC076; Tue, 24 Jul 2007 13:13:50 +1000 (EST) Subject: Re: pidfile change breaks qa test which run non-root pmcd From: Nathan Scott Reply-To: nscott@aconex.com To: Michael Newton Cc: pcp@oss.sgi.com In-Reply-To: References: Content-Type: text/plain Organization: Aconex Date: Tue, 24 Jul 2007 13:13:52 +1000 Message-Id: <1185246832.10702.54.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1328 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp On Tue, 2007-07-24 at 11:32 +1000, Michael Newton wrote: > Several QA tests run pmcd -f & > With the pidfile change, they break because they dont have > write permission on the pidfile. Here's a fix > The pidfile shouldn't be created when running pmcd with -f - this is a debugging option, and it shouldn't require root priveleges to run pmcd this way. cheers. -- Nathan From kimbrr@sgi.com Mon Jul 23 21:11:35 2007 Received: with ECARTIS (v1.0.0; list pcp); Mon, 23 Jul 2007 21:11:40 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l6O4BWbm004566 for ; Mon, 23 Jul 2007 21:11:34 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA10339; Tue, 24 Jul 2007 14:11:31 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l6O4BTeW34743433; Tue, 24 Jul 2007 14:11:30 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l6O4BSge34859940; Tue, 24 Jul 2007 14:11:29 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Tue, 24 Jul 2007 14:11:28 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: Nathan Scott cc: pcp@oss.sgi.com Subject: Re: pidfile change breaks qa test which run non-root pmcd In-Reply-To: <1185246832.10702.54.camel@edge.yarra.acx> Message-ID: References: <1185246832.10702.54.camel@edge.yarra.acx> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1329 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp On Tue, 24 Jul 2007, Nathan Scott wrote: > On Tue, 2007-07-24 at 11:32 +1000, Michael Newton wrote: > > Several QA tests run pmcd -f & > > With the pidfile change, they break because they dont have > > write permission on the pidfile. Here's a fix > > > > The pidfile shouldn't be created when running pmcd with -f - this is a > debugging option, and it shouldn't require root priveleges to run pmcd > this way. fair enough (tho some of the tests already do other fiddly logger/config chmods related to not being root). While im working on that, another review (to follow), to reinstate a fallback to SIGKILL when TERM doesnt work ps: qa/003 will need looking at in light of, eg, sample.darkness.. but i wont let that hold up the release. There'll no doubt be others tho.. From kimbrr@sgi.com Mon Jul 23 21:22:41 2007 Received: with ECARTIS (v1.0.0; list pcp); Mon, 23 Jul 2007 21:22:47 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l6O4Mabm008375 for ; Mon, 23 Jul 2007 21:22:39 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA10633 for ; Tue, 24 Jul 2007 14:22:38 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l6O4MceW34850086 for ; Tue, 24 Jul 2007 14:22:38 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l6O4MaRX34860983 for ; Tue, 24 Jul 2007 14:22:37 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Tue, 24 Jul 2007 14:22:36 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: pcp@oss.sgi.com Subject: in /etc/init.d/pcp, if TERM doesnt work, try KILL Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1330 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp in the current form of /etc/init.d/pcp, it is again possible without too much extra fuss to fall back to SIGKILL when TERM doesnt work. Heres a patch --- a/mgmt/pcp/src/pmcd/rc_pcp 2007-07-24 14:07:13.000000000 +1000 +++ b/mgmt/pcp/src/pmcd/rc_pcp 2007-07-24 14:00:49.654387806 +1000 @@ -4,7 +4,7 @@ # # Start or Stop the Performance Co-Pilot Daemon(s) # -# $Id: rc_pcp,v 1.44 2007/07/17 02:52:06 kimbrr Exp $ +# $Id: rc_pcp,v 1.43 2007/07/12 01:30:21 kimbrr Exp $ # # The following is for chkconfig on RedHat based systems # chkconfig: 2345 95 05 @@ -397,18 +397,12 @@ Assuming an uninstall from a chroot: PMCD not killed. If this is incorrect, kill -TERM can be applied to the above PID." exit - - # Send pmcd a SIGTERM, which is noted as a pending shutdown. - # When finished the currently active request, pmcd will close any - # connections, wait for any agents, and then exit. - # elif [ -f $PCP_RUN_DIR/pmcd.pid ] then TOKILL=`cat $PCP_RUN_DIR/pmcd.pid` if grep "^$TOKILL$" $tmp.tmp >/dev/null then - kill -TERM $TOKILL >/dev/null 2>&1 - rm -f $PCP_RUN_DIR/pmcd.pid + : else echo "Process ..." cat $tmp.tmp @@ -419,26 +413,45 @@ exit fi else - $PCP_KILLALL_PROG -TERM pmcd > /dev/null 2>&1 + TOKILL= fi + + # Send pmcd a SIGTERM, which is noted as a pending shutdown. + # When finished the currently active request, pmcd will close any + # connections, wait for any agents, and then exit. + # On failure, resort to SIGKILL. + # $ECHO $PCP_ECHO_N "Waiting for PMCD to terminate ...""$PCP_ECHO_C" delay=200 # tenths of a second - while [ $delay -gt 0 ] + for SIG in TERM KILL do - _get_pids_by_name pmcd >$tmp.tmp - [ ! -s $tmp.tmp ] && break - pmsleep 0.1 - delay=`expr $delay - 1` - [ `expr $delay % 10` -ne 0 ] || $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C" - done - if [ $delay -eq 0 ] # It just WON'T DIE, give up. - then + if [ "x$TOKILL" == "x" ] + then + $PCP_KILLALL_PROG -$SIG pmcd > /dev/null 2>&1 + else + kill -$SIG $TOKILL >/dev/null 2>&1 + rm -f $PCP_RUN_DIR/pmcd.pid + fi + while [ $delay -gt 0 ] + do + _get_pids_by_name pmcd >$tmp.tmp + [ ! -s $tmp.tmp ] && break 2 + pmsleep 0.1 + delay=`expr $delay - 1` + [ `expr $delay % 10` -ne 0 ] || $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C" + done echo "Process ..." cat $tmp.tmp - echo "$prog: Warning: PMCD won't die!" - exit - fi - $RC_STATUS -v + if [ "$SIG" == "TERM" ] + then + echo "$prog: Warning: Forcing PMCD to terminate!" + delay=20 + else + echo "$prog: Warning: PMCD won't die!" + exit + fi + done + $RC_STATUS -v pmpost "stop pmcd from $PCP_RC_DIR/pcp" } Dr.Michael("Kimba")Newton kimbrr@sgi.com From kimbrr@sgi.com Mon Jul 23 21:43:40 2007 Received: with ECARTIS (v1.0.0; list pcp); Mon, 23 Jul 2007 21:43:44 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l6O4habm015028 for ; Mon, 23 Jul 2007 21:43:39 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id OAA11073 for ; Tue, 24 Jul 2007 14:43:39 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l6O4hceW34808869 for ; Tue, 24 Jul 2007 14:43:38 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l6O4hbhq34853745 for ; Tue, 24 Jul 2007 14:43:38 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Tue, 24 Jul 2007 14:43:37 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: pcp@oss.sgi.com Subject: [PATCH] pidfile change breaks qa test which run non-root pmcd Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1331 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp in response to nathan's comment.. --- a/mgmt/pcp/src/pmcd/src/pmcd.c 2007-07-24 14:41:58.000000000 +1000 +++ b/mgmt/pcp/src/pmcd/src/pmcd.c 2007-07-24 14:28:57.140447430 +1000 @@ -1206,22 +1206,23 @@ /*NOTREACHED*/ } - run_dir = pmGetConfig("PCP_RUN_DIR"); - i = strlen(run_dir); - pidpath = malloc(i + strlen(PIDFILE) + 1); - memcpy(pidpath, run_dir, i); - strcpy(pidpath + i, PIDFILE); - pidfile = fopen(pidpath, "w"); - if (pidfile == NULL) { - fprintf(stderr, "Error: Cant open pidfile %s\n", pidpath); - DontStart(); - /*NOTREACHED*/ - } - fprintf(pidfile, "%d", getpid()); - fflush(pidfile); - fclose(pidfile); - free(pidpath); - + if (run_daemon) { + run_dir = pmGetConfig("PCP_RUN_DIR"); + i = strlen(run_dir); + pidpath = malloc(i + strlen(PIDFILE) + 1); + memcpy(pidpath, run_dir, i); + strcpy(pidpath + i, PIDFILE); + pidfile = fopen(pidpath, "w"); + if (pidfile == NULL) { + fprintf(stderr, "Error: Cant open pidfile %s\n", pidpath); + DontStart(); + /*NOTREACHED*/ + } + fprintf(pidfile, "%d", getpid()); + fflush(pidfile); + fclose(pidfile); + free(pidpath); + } PrintAgentInfo(stderr); __pmAccDumpHosts(stderr); fprintf(stderr, "\npmcd: PID = %u", (int)getpid()); Dr.Michael("Kimba")Newton kimbrr@sgi.com From jason.rappleye@gmail.com Tue Jul 24 15:23:23 2007 Received: with ECARTIS (v1.0.0; list pcp); Wed, 25 Jul 2007 01:06:24 -0700 (PDT) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.177]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l6OMNLbm005026 for ; Tue, 24 Jul 2007 15:23:22 -0700 Received: by wa-out-1112.google.com with SMTP id k22so2917075waf for ; Tue, 24 Jul 2007 15:23:25 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; b=FgGcDo+NCPJDW7gMFCJ5d063Gh3dAVXqikpEao1CKqxJywiddTZGNI/TGhgx4ruvwp1UIWk3v64rGD38RxvJsSkGnxhiQ5L2dCvv8CiIKhf9n7zqeiAwOe2t6Cb5nyszRtSZFwKe4RjxaGM+61532fg7eTNXCjRB6HEIU4VhFk4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; b=dE9v6axRJggaaOXpiMIyZM+beElZXbVw5oeAfnmjGqBW1omOKNvF80KOSggVVxvbTVapyCyheup6PJNbECoqx/T9gOEEw3jEf9ACIpcb0h2bXKOwOIGABM2NeHXSiyM4q2UK4dqRzf+aId1lt/PP9Wcz/x++edOpoLjz6emW2+o= Received: by 10.114.153.18 with SMTP id a18mr4490606wae.1185315374395; Tue, 24 Jul 2007 15:16:14 -0700 (PDT) Received: by 10.115.72.10 with HTTP; Tue, 24 Jul 2007 15:16:14 -0700 (PDT) Message-ID: <6ccb8bf80707241516u638affc3y2a5eba12efc9e635@mail.gmail.com> Date: Tue, 24 Jul 2007 18:16:14 -0400 From: "Jason Rappleye" To: nscott@aconex.com Subject: Re: Nathan Scott's kernel.all.pswitch patch Cc: "Michael Newton" , pcp@oss.sgi.com In-Reply-To: <1185232996.10702.48.camel@edge.yarra.acx> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_165533_17498477.1185315374212" References: <6ccb8bf80707160814u3093b234l597ce029b5687fb6@mail.gmail.com> <1185232996.10702.48.camel@edge.yarra.acx> X-archive-position: 1332 X-Approved-By: makc@sgi.com X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: jason.rappleye@gmail.com Precedence: bulk X-list: pcp ------=_Part_165533_17498477.1185315374212 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline On 7/23/07, Nathan Scott wrote: > > On Tue, 2007-07-17 at 10:35 +1000, Michael Newton wrote: > > On Mon, 16 Jul 2007, Jason Rappleye wrote: > > > It would appear that Nathan Scott's kernel.all.pswitch patch > > > > > > http://oss.sgi.com/archives/pcp/2007-03/msg00007.html > > > > > > didn't make it into the 2.7.1 release. Would appreciate it if it > > made it > > > into the next release. > > > > sure. Theres a queue of stuff which i hope to get in soon. ... > > In the meantime, Jason, you can get the PCP packages that we use on our > production systems from here: http://oss.sgi.com/~nathans/ Excellent - I did apply the patch and rebuild the RPM locally, but I'll give those a try instead. Will be happy to provide feedback...we may end up using PCP heavily on a cluster of ~1200 nodes, so I'm sure we'll run into some problems :-) j This includes that fix and numerous others too. Its updated relatively > frequently compared to base PCP, so additional testers are very welcome > and will help catch new problems before they propogate into SGI's PCP. > > cheers. > > -- > Nathan > > ------=_Part_165533_17498477.1185315374212 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline

On 7/23/07, Nathan Scott <nscott@aconex.com> wrote:
On Tue, 2007-07-17 at 10:35 +1000, Michael Newton wrote:
> On Mon, 16 Jul 2007, Jason Rappleye wrote:
> > It would appear that Nathan Scott's kernel.all.pswitch patch
> >
> > http://oss.sgi.com/archives/pcp/2007-03/msg00007.html
> >
> > didn't make it into the 2.7.1 release. Would appreciate it if it
> made it
> > into the next release.
>
> sure. Theres a queue of stuff which i hope to get in soon.   ...

In the meantime, Jason, you can get the PCP packages that we use on our
production systems from here:  http://oss.sgi.com/~nathans/

Excellent - I did apply the patch and rebuild the RPM locally, but I'll give those a try instead. Will be happy to provide feedback...we may end up using PCP heavily on a cluster of ~1200 nodes, so I'm sure we'll run into some problems :-)

j
 

This includes that fix and numerous others too.  Its updated relatively
frequently compared to base PCP, so additional testers are very welcome
and will help catch new problems before they propogate into SGI's PCP.

cheers.

--
Nathan



------=_Part_165533_17498477.1185315374212-- From kimbrr@sgi.com Wed Jul 25 01:13:03 2007 Received: with ECARTIS (v1.0.0; list pcp); Wed, 25 Jul 2007 01:13:07 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l6P8Cxbm018238 for ; Wed, 25 Jul 2007 01:13:01 -0700 Received: from snort.melbourne.sgi.com (snort.melbourne.sgi.com [134.14.54.149]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id SAA22222 for ; Wed, 25 Jul 2007 18:13:02 +1000 Received: from snort.melbourne.sgi.com (localhost [127.0.0.1]) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5) with ESMTP id l6P8D1eW36127953 for ; Wed, 25 Jul 2007 18:13:02 +1000 (AEST) Received: from localhost (kimbrr@localhost) by snort.melbourne.sgi.com (SGI-8.12.5/8.12.5/Submit) with ESMTP id l6P8D0tW36093543 for ; Wed, 25 Jul 2007 18:13:01 +1000 (AEST) X-Authentication-Warning: snort.melbourne.sgi.com: kimbrr owned process doing -bs Date: Wed, 25 Jul 2007 18:13:00 +1000 From: Michael Newton X-X-Sender: kimbrr@snort.melbourne.sgi.com To: pcp@oss.sgi.com Subject: [PATCH] in /etc/init.d/pcp, if TERM doesnt work, try KILL In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-archive-position: 1333 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kimbrr@sgi.com Precedence: bulk X-list: pcp I wrote: >in the current form of /etc/init.d/pcp, it is again possible without too >much extra fuss to fall back to SIGKILL when TERM doesnt work. >Heres a patch ..except that if I do that I'd better reinstate _killpmdas() as well.. BTW i chose this in preference to tackling the disentanglement of testing the -KILL fallback in qa/041 from that of IPC failure --- a/mgmt/pcp/src/pmcd/rc_pcp 2007-07-25 18:03:33.000000000 +1000 +++ b/mgmt/pcp/src/pmcd/rc_pcp 2007-07-25 18:01:15.005906760 +1000 @@ -4,7 +4,7 @@ # # Start or Stop the Performance Co-Pilot Daemon(s) # -# $Id: rc_pcp,v 1.44 2007/07/17 02:52:06 kimbrr Exp $ +# $Id: rc_pcp,v 1.43 2007/07/12 01:30:21 kimbrr Exp $ # # The following is for chkconfig on RedHat based systems # chkconfig: 2345 95 05 @@ -370,6 +370,46 @@ $RC_STATUS -v } +# Use $PCP_PMCDCONF_PATH to find and kill pipe/socket PMDAs created by PMCD. +# (First join up continued lines in config file) +# +_killpmdas() +{ + if [ ! -f $PCP_PMCDCONF_PATH ] + then + echo "$prog:"' +Warning: PMCD control file '"$PCP_PMCDCONF_PATH"' is missing, cannot identify PMDAs + to be terminated.' + return + fi + # Give each PMDA 2 seconds after a SIGTERM to die, then SIGKILL + for pmda in `$PCP_AWK_PROG <$PCP_PMCDCONF_PATH ' +/\\\\$/ { printf "%s ", substr($0, 0, length($0) - 1); next } + { print }' \ +| $PCP_AWK_PROG ' +$1 ~ /^#/ { next } +tolower($3) == "pipe" && NF > 4 { print $5; next } +tolower($3) == "socket" && NF > 5 { print $6; next }' \ +| sort -u` + do + $PCP_KILLALL_PROG -TERM `basename $pmda` > /dev/null 2>&1 & + done + sleep 2 + for pmda in `$PCP_AWK_PROG <$PCP_PMCDCONF_PATH ' +/\\\\$/ { printf "%s ", substr($0, 0, length($0) - 1); next } + { print }' \ +| $PCP_AWK_PROG ' +$1 ~ /^#/ { next } +tolower($3) == "pipe" && NF > 4 { print $5; next } +tolower($3) == "socket" && NF > 5 { print $6; next }' \ +| sort -u` + do + $PCP_KILLALL_PROG -KILL `basename $pmda` > /dev/null 2>&1 & + done + + wait +} + _shutdown() { # Is pmcd running? @@ -397,18 +437,12 @@ Assuming an uninstall from a chroot: PMCD not killed. If this is incorrect, kill -TERM can be applied to the above PID." exit - - # Send pmcd a SIGTERM, which is noted as a pending shutdown. - # When finished the currently active request, pmcd will close any - # connections, wait for any agents, and then exit. - # elif [ -f $PCP_RUN_DIR/pmcd.pid ] then TOKILL=`cat $PCP_RUN_DIR/pmcd.pid` if grep "^$TOKILL$" $tmp.tmp >/dev/null then - kill -TERM $TOKILL >/dev/null 2>&1 - rm -f $PCP_RUN_DIR/pmcd.pid + : else echo "Process ..." cat $tmp.tmp @@ -419,26 +453,54 @@ exit fi else - $PCP_KILLALL_PROG -TERM pmcd > /dev/null 2>&1 + TOKILL= fi + + # Send pmcd a SIGTERM, which is noted as a pending shutdown. + # When finished the currently active request, pmcd will close any + # connections, wait for any agents, and then exit. + # On failure, resort to SIGKILL. + # $ECHO $PCP_ECHO_N "Waiting for PMCD to terminate ...""$PCP_ECHO_C" - delay=200 # tenths of a second - while [ $delay -gt 0 ] + delay=80 # tenths of a second + for SIG in TERM KILL do - _get_pids_by_name pmcd >$tmp.tmp - [ ! -s $tmp.tmp ] && break - pmsleep 0.1 - delay=`expr $delay - 1` - [ `expr $delay % 10` -ne 0 ] || $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C" - done - if [ $delay -eq 0 ] # It just WON'T DIE, give up. - then + if [ "x$TOKILL" == "x" ] + then + $PCP_KILLALL_PROG -$SIG pmcd > /dev/null 2>&1 + else + kill -$SIG $TOKILL >/dev/null 2>&1 + rm -f $PCP_RUN_DIR/pmcd.pid + fi + while [ $delay -gt 0 ] + do + _get_pids_by_name pmcd >$tmp.tmp + [ ! -s $tmp.tmp ] && break 2 + pmsleep 0.1 + delay=`expr $delay - 1` + [ "$SIG" == "TERM" ] && [ `expr $delay % 10` -eq 0 ] \ + && $ECHO $PCP_ECHO_N ".""$PCP_ECHO_C" + done + echo echo "Process ..." - cat $tmp.tmp - echo "$prog: Warning: PMCD won't die!" - exit - fi - $RC_STATUS -v + if [ "$SIG" == "TERM" ] + then + ps $PCP_PS_ALL_FLAGS >$tmp.ps + sed 1q $tmp.ps + for pid in `cat $tmp.tmp` + do + $PCP_AWK_PROG <$tmp.ps "\$2 == $pid { print }" + done + echo "$prog: Warning: Forcing PMCD to terminate!" + delay=20 + else + cat $tmp.tmp + echo "$prog: Warning: PMCD won't die!" + exit + fi + done + _killpmdas + $RC_STATUS -v pmpost "stop pmcd from $PCP_RC_DIR/pcp" } Dr.Michael("Kimba")Newton kimbrr@sgi.com From nscott@aconex.com Tue Jul 31 16:58:17 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 31 Jul 2007 16:58:23 -0700 (PDT) Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l6VNwCbm009093 for ; Tue, 31 Jul 2007 16:58:15 -0700 Received: from edge.yarra.acx (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 38E8892C744 for ; Wed, 1 Aug 2007 09:58:15 +1000 (EST) Subject: pcp updates From: Nathan Scott Reply-To: nscott@aconex.com To: pcp@oss.sgi.com Content-Type: text/plain Organization: Aconex Date: Wed, 01 Aug 2007 09:58:37 +1000 Message-Id: <1185926317.21829.39.camel@edge.yarra.acx> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1334 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Changes committed to git://oss.sgi.com:8090/nathans/pcp.git build/mac/GNUmakefile | 8 ++-- build/mac/build-installer | 70 +++++++++++++++++++++++++++++------------ configure.in | 5 ++ src/include/builddefs.in | 1 src/include/platform_defs.h.in | 2 - 5 files changed, 62 insertions(+), 24 deletions(-) commit 8b391cb6ddbeae331fb08ecf44712462b696e372 Author: Nathan Scott Date: Wed Aug 1 07:50:04 2007 +1000 Add the package version number into the generated MacOSX package. commit 57c413cdd52c268680cbb106e2600f560615232f Author: Nathan Scott Date: Tue Jul 31 21:35:58 2007 +1000 Add MacOSX packaging changes to generate a (single) disk image file. Thanks to Nick Blievers for pointers. commit 1b7b490396d0006b091400728a12cbc0bd4b1383 Author: Nathan Scott Date: Tue Jul 31 21:33:13 2007 +1000 Missed checkin of earlier platform_defs patch snippet, for MacOSX builds.