From nscott@aconex.com Tue Mar 6 20:23:58 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 06 Mar 2007 20:24:04 -0800 (PST) X-Spam-oss-Status: No, score=1.0 required=5.0 tests=BAYES_50,J_BACKHAIR_11 autolearn=no version=3.2.0-pre1-r499012 Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l274Nt6p003823 for ; Tue, 6 Mar 2007 20:23:57 -0800 Received: from edge (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 0DEF4AAC1B4; Wed, 7 Mar 2007 15:04:29 +1100 (EST) Subject: [PATCH] sginap bug fix and regression test From: Nathan Scott Reply-To: nscott@aconex.com To: Michael Newton Cc: pcp@oss.sgi.com Content-Type: multipart/mixed; boundary="=-u/lF09s1IviNUvaZqhPs" Organization: Aconex Date: Wed, 07 Mar 2007 15:22:36 +1100 Message-Id: <1173241356.5051.50.camel@edge> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 X-archive-position: 1039 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Content-Length: 7740 Lines: 276 --=-u/lF09s1IviNUvaZqhPs Content-Type: text/plain Content-Transfer-Encoding: 7bit Hi, I found another problem in pmie. When using a large sample interval (where one hour is large, for example), the calculation of sleep time wraps in pmie, and becomes negative. This snippet of code in pmie:: sleepTight() exhibits the problem: for (;;) { /* loop to catch early wakeup from sginap */ curr = getReal(); delay = CLK_TCK * (long)(sched - curr); if (delay < 1) return; sginap(delay); Because delay is a long, and CLK_TCK is 1000000, as soon as the delta between the current time and the next scheduled time goes beyond a certain size, the multiply-by-CLK_TCK causes a wrap. When you look at the sginap implementation on non-IRIX platforms, you can see that all it does is convert back from ticks to seconds right away, and then calls sleep/usleep. In fixing this I've taken the approach of using nanosleep(3) instead of sginap. This is a POSIX-specified high-resolution sleep interface (exists on all platforms I've checked - IRIX, Linux, MacOSX, Cygwin, FreeBSD - I think all PCP platforms are covered), which is handy in that it returns the time remaining if interrupted (which means extra gettimeofday calls and struct timeval arithmetic isn't needed). I've added a QA test which mimics the original observed problem, and verified that its now fixed and that the other pmie tests still pass. Over time, all sginap/sleep/usleep calls should probably be phased out in preference for nanosleep(3) - I've only tackled pmie and pmval with this patch though. cheers. -- Nathan --=-u/lF09s1IviNUvaZqhPs Content-Disposition: attachment; filename=use-nanosleep-not-sginap Content-Type: text/x-patch; name=use-nanosleep-not-sginap; charset=UTF-8 Content-Transfer-Encoding: 7bit Index: devpcp/src/pmie/src/dstruct.c =================================================================== --- devpcp.orig/src/pmie/src/dstruct.c 2007-03-07 13:23:23.270198500 +1100 +++ devpcp/src/pmie/src/dstruct.c 2007-03-07 13:23:44.487524500 +1100 @@ -182,12 +182,29 @@ reflectTime(RealTime d) } +/* convert RealTime to timeval */ +void +unrealize(RealTime rt, struct timeval *tv) +{ + tv->tv_sec = (time_t)rt; + tv->tv_usec = (int)(1000000 * (rt - tv->tv_sec)); +} + + +/* convert RealTime to timespec */ +void +unrealizenano(RealTime rt, struct timespec *ts) +{ + ts->tv_sec = (time_t)rt; + ts->tv_nsec = (int)(1000000000 * (rt - ts->tv_sec)); +} + + /* sleep until given RealTime */ void sleepTight(RealTime sched) { - RealTime curr; /* current time */ - long delay; /* interval to sleep */ + RealTime delay; /* interval to sleep */ int sts; pid_t pid; @@ -207,27 +224,21 @@ sleepTight(RealTime sched) ; } - if (archives) - return; + if (!archives) { + struct timespec ts, tleft; - for (;;) { /* loop to catch early wakeup from sginap */ - curr = getReal(); - delay = CLK_TCK * (long)(sched - curr); - if (delay < 1) return; - sginap(delay); + delay = sched - getReal(); + unrealizenano(delay, &ts); + for (;;) { /* loop to catch early wakeup from nanosleep */ + sts = nanosleep(&ts, &tleft); + if (sts == 0 || (sts < 0 && errno != EINTR)) + break; + ts = tleft; + } } } -/* convert RealTime to timeval */ -void -unrealize(RealTime rt, struct timeval *tv) -{ - tv->tv_sec = (time_t)rt; - tv->tv_usec = (int)(1000000 * (rt - tv->tv_sec)); -} - - /*********************************************************************** * ring buffer management ***********************************************************************/ Index: devpcp/src/pmval/pmval.c =================================================================== --- devpcp.orig/src/pmval/pmval.c 2007-03-07 13:23:30.714663750 +1100 +++ devpcp/src/pmval/pmval.c 2007-03-07 13:23:33.254822500 +1100 @@ -188,6 +188,15 @@ tsub(struct timeval t1, struct timeval t return t1; } +/* convert timeval */ +static struct timespec +tspec(struct timeval tv, struct timespec *ts) +{ + ts->tv_nsec = tv.tv_usec * 1000; + ts->tv_sec = tv.tv_sec; + return *ts; +} + /* * a : b for struct timevals ... <0 for a0 for a>b */ @@ -201,23 +210,6 @@ tcmp(struct timeval *a, struct timeval * return res; } -/* first timeval has reached second timeval to within 1 tick */ -static int -reached(struct timeval t1, struct timeval t2) -{ - static struct timeval tick = { 0, 0 }; - - if (tick.tv_usec == 0) - /* one trip, usec per tick */ - tick.tv_usec = 1000000 / CLK_TCK; - - t1 = tadd(t1, tick); - - return (t1.tv_sec > t2.tv_sec) || - (t1.tv_sec == t2.tv_sec && t1.tv_usec >= t2.tv_usec); - -} - /* convert timeval to ticks - positive time only @@ -250,14 +242,18 @@ tosec(struct timeval t) static void sleeptill(struct timeval sched) { + int sts; struct timeval curr; /* current time */ - struct timeval delay; /* interval to sleep */ + struct timespec delay; /* interval to sleep */ + struct timespec left; /* remaining sleep time */ - for (;;) { /* loop to catch early wakeup by sginap */ - gettimeofday(&curr, NULL); - if (reached(curr,sched)) return; - delay = tsub(sched, curr); - sginap(toticks(delay)); + gettimeofday(&curr, NULL); + delay = tspec(tsub(sched, curr), &delay); + for (;;) { /* loop to catch early wakeup by nanosleep */ + sts == nanosleep(&delay, &left); + if (sts == 0 || (sts < 0 && errno != EINTR)) + break; + delay = left; } } @@ -1554,6 +1550,7 @@ int main(int argc, char *argv[]) { struct timeval delta; /* sample interval */ + struct timespec delay; /* nanosleep interval */ long smpls; /* number of samples */ int cols; /* width of output column */ struct timeval now; /* current task start time */ @@ -1638,7 +1635,7 @@ main(int argc, char *argv[]) /* wait till time for sample */ if (pauseFlag) { - sginap(toticks(delta)); + nanosleep(tspec(delta, &delay)); } else if (archive == NULL) { sched = tadd(now,delta); --=-u/lF09s1IviNUvaZqhPs Content-Disposition: attachment; filename=312-pmie-large-delta Content-Type: text/x-patch; name=312-pmie-large-delta; charset=UTF-8 Content-Transfer-Encoding: 7bit Index: qa/312 =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ qa/312 2007-03-07 10:07:42.448443000 +1100 @@ -0,0 +1,42 @@ +#! /bin/sh +# PCP QA Test No. 312 +# sginap use in pmie with large deltas on 32 bit platforms can +# wrap and cause a sleep with negative size which, funnily enough, +# also causes multiple immediate rule evaluations (which we can +# now test for, to detect pmie brokenness). +# +# Copyright (c) 2007 Nathan Scott. +# +# creator +owner=nathans + +seq=`basename $0` +echo "QA output created by $seq" + +# get standard environment, filters and checks +. ./common.product +. ./common.filter +. ./common.check + +tmp=/tmp/$$ +here=`pwd` +sudo=$here/sudo +status=1 # failure is the default! +$sudo rm -rf $tmp.* +trap "rm -f $tmp.*; exit \$status" 0 1 2 3 15 + +# real QA test starts here +echo 'load = sample.load;' | pmie -v -t 1hour >$tmp.out 2>$tmp.err & +pmie_pid=$! + +sleep 2 +kill -INT $pmie_pid +wait + +echo "pmie output ..." +cat $tmp.out +echo "pmie stderr ..." +cat $tmp.err + +# success, all done +exit Index: qa/312.out =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ qa/312.out 2007-03-07 10:08:13.026354000 +1100 @@ -0,0 +1,5 @@ +QA output created by 312 +pmie output ... +load: 42 + +pmie stderr ... --=-u/lF09s1IviNUvaZqhPs-- From nscott@aconex.com Tue Mar 6 21:03:33 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 06 Mar 2007 21:03:38 -0800 (PST) X-Spam-oss-Status: No, score=0.0 required=5.0 tests=BAYES_50 autolearn=no version=3.2.0-pre1-r499012 Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l2753V6p027876 for ; Tue, 6 Mar 2007 21:03:32 -0800 Received: from edge (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id E42F0AAC350; Wed, 7 Mar 2007 15:44:08 +1100 (EST) Subject: Re: [PATCH] sginap bug fix and regression test From: Nathan Scott Reply-To: nscott@aconex.com To: Michael Newton Cc: pcp@oss.sgi.com In-Reply-To: <1173241356.5051.50.camel@edge> References: <1173241356.5051.50.camel@edge> Content-Type: multipart/mixed; boundary="=-2D8HqPVY3/Bp537FgqGQ" Organization: Aconex Date: Wed, 07 Mar 2007 16:02:16 +1100 Message-Id: <1173243736.5051.54.camel@edge> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 X-archive-position: 1040 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Content-Length: 5576 Lines: 210 --=-2D8HqPVY3/Bp537FgqGQ Content-Type: text/plain Content-Transfer-Encoding: 7bit On Wed, 2007-03-07 at 15:22 +1100, Nathan Scott wrote: > ... > Over time, all sginap/sleep/usleep calls should probably be phased out > in preference for nanosleep(3) - I've only tackled pmie and pmval with > this patch though. I missed a quilt refresh on the pmval part - here's the latest version (other woulda had some harmless gcc warnings about unused vars/funcs, I think). You may also see some patch offset wierdness when applying the pmval chunk, because I have several other changes in that file in other patches not yet sent. cheers. -- Nathan --=-2D8HqPVY3/Bp537FgqGQ Content-Disposition: attachment; filename=use-nanosleep-not-sginap Content-Type: text/x-patch; name=use-nanosleep-not-sginap; charset=UTF-8 Content-Transfer-Encoding: 7bit Index: devpcp/src/pmie/src/dstruct.c =================================================================== --- devpcp.orig/src/pmie/src/dstruct.c 2007-03-07 13:23:23.270198500 +1100 +++ devpcp/src/pmie/src/dstruct.c 2007-03-07 13:23:44.487524500 +1100 @@ -182,12 +182,29 @@ reflectTime(RealTime d) } +/* convert RealTime to timeval */ +void +unrealize(RealTime rt, struct timeval *tv) +{ + tv->tv_sec = (time_t)rt; + tv->tv_usec = (int)(1000000 * (rt - tv->tv_sec)); +} + + +/* convert RealTime to timespec */ +void +unrealizenano(RealTime rt, struct timespec *ts) +{ + ts->tv_sec = (time_t)rt; + ts->tv_nsec = (int)(1000000000 * (rt - ts->tv_sec)); +} + + /* sleep until given RealTime */ void sleepTight(RealTime sched) { - RealTime curr; /* current time */ - long delay; /* interval to sleep */ + RealTime delay; /* interval to sleep */ int sts; pid_t pid; @@ -207,27 +224,21 @@ sleepTight(RealTime sched) ; } - if (archives) - return; + if (!archives) { + struct timespec ts, tleft; - for (;;) { /* loop to catch early wakeup from sginap */ - curr = getReal(); - delay = CLK_TCK * (long)(sched - curr); - if (delay < 1) return; - sginap(delay); + delay = sched - getReal(); + unrealizenano(delay, &ts); + for (;;) { /* loop to catch early wakeup from nanosleep */ + sts = nanosleep(&ts, &tleft); + if (sts == 0 || (sts < 0 && errno != EINTR)) + break; + ts = tleft; + } } } -/* convert RealTime to timeval */ -void -unrealize(RealTime rt, struct timeval *tv) -{ - tv->tv_sec = (time_t)rt; - tv->tv_usec = (int)(1000000 * (rt - tv->tv_sec)); -} - - /*********************************************************************** * ring buffer management ***********************************************************************/ Index: devpcp/src/pmval/pmval.c =================================================================== --- devpcp.orig/src/pmval/pmval.c 2007-03-07 13:23:30.714663750 +1100 +++ devpcp/src/pmval/pmval.c 2007-03-07 14:13:11.920977500 +1100 @@ -201,44 +201,6 @@ tcmp(struct timeval *a, struct timeval * return res; } -/* first timeval has reached second timeval to within 1 tick */ -static int -reached(struct timeval t1, struct timeval t2) -{ - static struct timeval tick = { 0, 0 }; - - if (tick.tv_usec == 0) - /* one trip, usec per tick */ - tick.tv_usec = 1000000 / CLK_TCK; - - t1 = tadd(t1, tick); - - return (t1.tv_sec > t2.tv_sec) || - (t1.tv_sec == t2.tv_sec && t1.tv_usec >= t2.tv_usec); - -} - - -/* convert timeval to ticks - - positive time only - - accurate to 1 tick */ -static long -toticks(struct timeval t) -{ - static int ticks_per_sec = 0; - long ticks; - - if (ticks_per_sec == 0) - ticks_per_sec = CLK_TCK; - - ticks = ticks_per_sec * t.tv_sec + ticks_per_sec * t.tv_usec/1000000; - - if (ticks > 0) - return ticks; - else - return 1L; -} - /* convert timeval to seconds */ static double tosec(struct timeval t) @@ -246,18 +208,32 @@ tosec(struct timeval t) return t.tv_sec + (t.tv_usec / 1000000.0); } +/* convert timeval to timespec */ +static struct timespec * +tospec(struct timeval tv, struct timespec *ts) +{ + ts->tv_nsec = tv.tv_usec * 1000; + ts->tv_sec = tv.tv_sec; + return ts; +} + + /* sleep until given timeval */ static void sleeptill(struct timeval sched) { + int sts; struct timeval curr; /* current time */ - struct timeval delay; /* interval to sleep */ + struct timespec delay; /* interval to sleep */ + struct timespec left; /* remaining sleep time */ - for (;;) { /* loop to catch early wakeup by sginap */ - gettimeofday(&curr, NULL); - if (reached(curr,sched)) return; - delay = tsub(sched, curr); - sginap(toticks(delay)); + gettimeofday(&curr, NULL); + tospec(tsub(sched, curr), &delay); + for (;;) { /* loop to catch early wakeup by nanosleep */ + sts = nanosleep(&delay, &left); + if (sts == 0 || (sts < 0 && errno != EINTR)) + break; + delay = left; } } @@ -1554,6 +1530,8 @@ int main(int argc, char *argv[]) { struct timeval delta; /* sample interval */ + struct timespec delay; /* nanosleep interval */ + struct timespec left; /* nanosleep remainder */ long smpls; /* number of samples */ int cols; /* width of output column */ struct timeval now; /* current task start time */ @@ -1638,7 +1616,7 @@ main(int argc, char *argv[]) /* wait till time for sample */ if (pauseFlag) { - sginap(toticks(delta)); + nanosleep(tospec(delta, &delay), &left); } else if (archive == NULL) { sched = tadd(now,delta); --=-2D8HqPVY3/Bp537FgqGQ-- From nscott@aconex.com Wed Mar 7 20:53:47 2007 Received: with ECARTIS (v1.0.0; list pcp); Wed, 07 Mar 2007 20:53:53 -0800 (PST) X-Spam-oss-Status: No, score=0.0 required=5.0 tests=BAYES_50 autolearn=no version=3.2.0-pre1-r499012 Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l284ri6p024256 for ; Wed, 7 Mar 2007 20:53:46 -0800 Received: from edge (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 5CDE8AAC237; Thu, 8 Mar 2007 15:34:11 +1100 (EST) Subject: [PATCH] further instantaneous vs discrete metric descriptor fixes From: Nathan Scott Reply-To: nscott@aconex.com To: Michael Newton Cc: pcp@oss.sgi.com Content-Type: multipart/mixed; boundary="=-D6PlAT63Ql7lY6QcYgf6" Organization: Aconex Date: Thu, 08 Mar 2007 15:52:36 +1100 Message-Id: <1173329556.5051.64.camel@edge> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 X-archive-position: 1049 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Content-Length: 4710 Lines: 127 --=-D6PlAT63Ql7lY6QcYgf6 Content-Type: text/plain Content-Transfer-Encoding: 7bit Hi, This caused some pmie problems for us this time, when running pmie rules against archives which had used "log once" semantics for the filesys.capacity metric in particular. A quick audit found several affected PMDAs and metrics. cheers. -- Nathan --=-D6PlAT63Ql7lY6QcYgf6 Content-Disposition: attachment; filename=fix-more-discretes Content-Type: text/x-patch; name=fix-more-discretes; charset=UTF-8 Content-Transfer-Encoding: 7bit Index: devpcp/src/pmdas/darwin/pmda.c =================================================================== --- devpcp.orig/src/pmdas/darwin/pmda.c 2007-03-08 11:54:28.488458000 +1100 +++ devpcp/src/pmdas/darwin/pmda.c 2007-03-08 11:56:42.404827250 +1100 @@ -176,11 +176,11 @@ static pmdaMetric metrictab[] = { /* hinv.physmem */ { NULL, { PMDA_PMID(CLUSTER_VMSTAT,2), PM_TYPE_U32, PM_INDOM_NULL, - PM_SEM_INSTANT, PMDA_PMUNITS(1,0,0,PM_SPACE_MBYTE,0,0) }, }, + PM_SEM_DISCRETE, PMDA_PMUNITS(1,0,0,PM_SPACE_MBYTE,0,0) }, }, /* mem.physmem */ { NULL, { PMDA_PMID(CLUSTER_VMSTAT,3), PM_TYPE_U64, PM_INDOM_NULL, - PM_SEM_INSTANT, PMDA_PMUNITS(1,0,0,PM_SPACE_KBYTE,0,0) }, }, + PM_SEM_DISCRETE, PMDA_PMUNITS(1,0,0,PM_SPACE_KBYTE,0,0) }, }, /* mem.freemem */ { NULL, { PMDA_PMID(CLUSTER_VMSTAT,4), PM_TYPE_U64, PM_INDOM_NULL, @@ -299,7 +299,7 @@ static pmdaMetric metrictab[] = { /* filesys.capacity */ { NULL, { PMDA_PMID(CLUSTER_FILESYS,32), PM_TYPE_U64, FILESYS_INDOM, - PM_SEM_INSTANT, PMDA_PMUNITS(1,0,0,PM_SPACE_KBYTE,0,0) }, }, + PM_SEM_DISCRETE, PMDA_PMUNITS(1,0,0,PM_SPACE_KBYTE,0,0) }, }, /* filesys.used */ { NULL, { PMDA_PMID(CLUSTER_FILESYS,33), PM_TYPE_U64, FILESYS_INDOM, Index: devpcp/src/pmdas/linux/pmda.c =================================================================== --- devpcp.orig/src/pmdas/linux/pmda.c 2007-03-08 11:54:28.400452500 +1100 +++ devpcp/src/pmdas/linux/pmda.c 2007-03-08 12:06:11.736408250 +1100 @@ -579,7 +579,7 @@ static pmdaMetric metrictab[] = { /* hinv.physmem */ { NULL, - { PMDA_PMID(CLUSTER_MEMINFO,9), PM_TYPE_U32, PM_INDOM_NULL, PM_SEM_INSTANT, + { PMDA_PMID(CLUSTER_MEMINFO,9), PM_TYPE_U32, PM_INDOM_NULL, PM_SEM_DISCRETE, PMDA_PMUNITS(1,0,0,PM_SPACE_MBYTE,0,0) }, }, /* mem.freemem */ @@ -805,7 +805,7 @@ static pmdaMetric metrictab[] = { /* filesys.capacity */ { NULL, - { PMDA_PMID(CLUSTER_FILESYS,1), PM_TYPE_U64, FILESYS_INDOM, PM_SEM_INSTANT, + { PMDA_PMID(CLUSTER_FILESYS,1), PM_TYPE_U64, FILESYS_INDOM, PM_SEM_DISCRETE, PMDA_PMUNITS(1,0,0,PM_SPACE_KBYTE,0,0) } }, /* filesys.used */ @@ -820,7 +820,7 @@ static pmdaMetric metrictab[] = { /* filesys.maxfiles */ { NULL, - { PMDA_PMID(CLUSTER_FILESYS,4), PM_TYPE_U32, FILESYS_INDOM, PM_SEM_INSTANT, + { PMDA_PMID(CLUSTER_FILESYS,4), PM_TYPE_U32, FILESYS_INDOM, PM_SEM_DISCRETE, PMDA_PMUNITS(0,0,0,0,0,0) } }, /* filesys.usedfiles */ @@ -868,17 +868,17 @@ static pmdaMetric metrictab[] = { /* swapdev.length */ { NULL, - { PMDA_PMID(CLUSTER_SWAPDEV,1), PM_TYPE_U32, SWAPDEV_INDOM, PM_SEM_INSTANT, + { PMDA_PMID(CLUSTER_SWAPDEV,1), PM_TYPE_U32, SWAPDEV_INDOM, PM_SEM_DISCRETE, PMDA_PMUNITS(1,0,0,PM_SPACE_KBYTE,0,0) } }, /* swapdev.maxswap */ { NULL, - { PMDA_PMID(CLUSTER_SWAPDEV,2), PM_TYPE_U32, SWAPDEV_INDOM, PM_SEM_INSTANT, + { PMDA_PMID(CLUSTER_SWAPDEV,2), PM_TYPE_U32, SWAPDEV_INDOM, PM_SEM_DISCRETE, PMDA_PMUNITS(1,0,0,PM_SPACE_KBYTE,0,0) } }, /* swapdev.vlength */ { NULL, - { PMDA_PMID(CLUSTER_SWAPDEV,3), PM_TYPE_U32, SWAPDEV_INDOM, PM_SEM_INSTANT, + { PMDA_PMID(CLUSTER_SWAPDEV,3), PM_TYPE_U32, SWAPDEV_INDOM, PM_SEM_DISCRETE, PMDA_PMUNITS(1,0,0,PM_SPACE_KBYTE,0,0) } }, /* swapdev.priority */ @@ -2485,7 +2485,7 @@ static pmdaMetric metrictab[] = { /* hinv.machine */ { NULL, - { PMDA_PMID(CLUSTER_CPUINFO, 7), PM_TYPE_STRING, PM_INDOM_NULL, PM_SEM_INSTANT, + { PMDA_PMID(CLUSTER_CPUINFO, 7), PM_TYPE_STRING, PM_INDOM_NULL, PM_SEM_DISCRETE, PMDA_PMUNITS(0,0,0,0,0,0) } }, /* Index: devpcp/src/pmdas/windows/data.c =================================================================== --- devpcp.orig/src/pmdas/windows/data.c 2007-03-08 11:57:14.242817000 +1100 +++ devpcp/src/pmdas/windows/data.c 2007-03-08 11:57:48.040929250 +1100 @@ -657,7 +657,7 @@ static struct { }, /* filesys.capacity */ - { { PMDA_PMID(0,117), PM_TYPE_U64, LDISK_INDOM, PM_SEM_INSTANT, + { { PMDA_PMID(0,117), PM_TYPE_U64, LDISK_INDOM, PM_SEM_DISCRETE, PMDA_PMUNITS(1,0,0,PM_SPACE_KBYTE,0,0) }, Q_LDISK, M_NONE, "" }, --=-D6PlAT63Ql7lY6QcYgf6-- From nscott@aconex.com Sun Mar 18 22:57:02 2007 Received: with ECARTIS (v1.0.0; list pcp); Sun, 18 Mar 2007 22:57:07 -0700 (PDT) X-Spam-oss-Status: No, score=0.0 required=5.0 tests=BAYES_50 autolearn=no version=3.2.0-pre1-r499012 Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l2J5v06p023376 for ; Sun, 18 Mar 2007 22:57:01 -0700 Received: from edge (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 1F0C9AAC2A4; Mon, 19 Mar 2007 16:56:59 +1100 (EST) Subject: [PATCH] fix objstyle script issues From: Nathan Scott Reply-To: nscott@aconex.com To: Michael Newton Cc: pcp@oss.sgi.com Content-Type: multipart/mixed; boundary="=-LRv28Ob5szvICThsKRON" Organization: Aconex Date: Mon, 19 Mar 2007 16:57:18 +1100 Message-Id: <1174283838.5051.283.camel@edge> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 X-archive-position: 1120 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Content-Length: 2554 Lines: 80 --=-LRv28Ob5szvICThsKRON Content-Type: text/plain Content-Transfer-Encoding: 7bit These patches fix up a couple of problems with the objstyle script which is run as part of the pmdas/pmcd build: 1. shell syntax issue (?) for some versions of /bin/sh 2. location of uncompressed magic file on some Linux distros cheers. -- Nathan --=-LRv28Ob5szvICThsKRON Content-Disposition: attachment; filename=fix-objstyle Content-Type: text/plain; name=fix-objstyle; charset=UTF-8 Content-Transfer-Encoding: 7bit src/pmdas/pmcd/src 510> make ./objstyle: line 42: [: missing `]' Index: devpcp/src/pmdas/pmcd/src/objstyle =================================================================== --- devpcp.orig/src/pmdas/pmcd/src/objstyle 2007-03-19 16:33:58.228789000 +1100 +++ devpcp/src/pmdas/pmcd/src/objstyle 2007-03-19 16:34:03.389111500 +1100 @@ -39,7 +39,7 @@ then elif [ -f /usr/share/magic ] then appstyle=`file -m /usr/share/magic dummy.o` -elif [ -f /etc/magic] +elif [ -f /etc/magic ] then appstyle=`file -m /etc/magic dummy.o` else --=-LRv28Ob5szvICThsKRON Content-Disposition: attachment; filename=update-objstyle Content-Type: text/x-patch; name=update-objstyle; charset=UTF-8 Content-Transfer-Encoding: 7bit Index: devpcp/src/pmdas/pmcd/src/objstyle =================================================================== --- devpcp.orig/src/pmdas/pmcd/src/objstyle 2007-03-19 16:48:53.220722500 +1100 +++ devpcp/src/pmdas/pmcd/src/objstyle 2007-03-19 16:49:49.380232250 +1100 @@ -36,6 +36,9 @@ cc -c dummy.c if [ -f /usr/share/misc/magic ] then appstyle=`file -m /usr/share/misc/magic dummy.o` +elif [ -f /usr/share/file/magic ] +then + appstyle=`file -m /usr/share/file/magic dummy.o` elif [ -f /usr/share/magic ] then appstyle=`file -m /usr/share/magic dummy.o` --=-LRv28Ob5szvICThsKRON Content-Disposition: attachment; filename=fix-magic-605 Content-Type: text/x-patch; name=fix-magic-605; charset=UTF-8 Content-Transfer-Encoding: 7bit Index: qa/605 =================================================================== --- qa.orig/605 2007-03-19 16:51:37.759005500 +1100 +++ qa/605 2007-03-19 16:51:25.786257250 +1100 @@ -104,6 +104,9 @@ then if [ -f /usr/share/misc/magic ] then appstyle=`file -m /usr/share/misc/magic $PCP_DEMOS_DIR/trace/app1` + elif [ -f /usr/share/file/magic ] + then + appstyle=`file -m /usr/share/file/magic $PCP_DEMOS_DIR/trace/app1` elif [ -f /usr/share/magic ] then appstyle=`file -m /usr/share/magic $PCP_DEMOS_DIR/trace/app1` --=-LRv28Ob5szvICThsKRON-- From nscott@aconex.com Sun Mar 18 23:02:14 2007 Received: with ECARTIS (v1.0.0; list pcp); Sun, 18 Mar 2007 23:02:18 -0700 (PDT) X-Spam-oss-Status: No, score=0.0 required=5.0 tests=BAYES_50 autolearn=no version=3.2.0-pre1-r499012 Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l2J62D6p024396 for ; Sun, 18 Mar 2007 23:02:14 -0700 Received: from edge (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 0C8C2AAC2A4; Mon, 19 Mar 2007 17:02:12 +1100 (EST) Subject: [PATCH] fix pmcd agent vs empty /var/tmp/pmie From: Nathan Scott Reply-To: nscott@aconex.com To: Michael Newton Cc: pcp@oss.sgi.com Content-Type: multipart/mixed; boundary="=-MoG0SngLowdHdON/FxH7" Organization: Aconex Date: Mon, 19 Mar 2007 17:02:31 +1100 Message-Id: <1174284151.5051.290.camel@edge> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 X-archive-position: 1121 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Content-Length: 1705 Lines: 57 --=-MoG0SngLowdHdON/FxH7 Content-Type: text/plain Content-Transfer-Encoding: 7bit Hi, There's a problem with the pmcd PMDA's handling of the /var/tmp/pmie directory. Currently, if that directory disappears entirely (which the "/etc/init.d/pmie stop" mechanism tends to do), then the change to the instance domain is not properly detected in the pmcd PMDA, and it continues to export values for the no-longer-running pmies. This fixes that by also updating the indom on stat(2) failure. cheers. -- Nathan --=-MoG0SngLowdHdON/FxH7 Content-Disposition: attachment; filename=fix-pmie-empty-instance Content-Type: text/x-patch; name=fix-pmie-empty-instance; charset=UTF-8 Content-Transfer-Encoding: 7bit Index: devpcp/src/pmdas/pmcd/src/pmcd.c =================================================================== --- devpcp.orig/src/pmdas/pmcd/src/pmcd.c 2007-03-19 16:32:24.926958000 +1100 +++ devpcp/src/pmdas/pmcd/src/pmcd.c 2007-03-19 16:32:46.680317500 +1100 @@ -417,11 +417,8 @@ refresh_pmie_indom(void) npmies = 0; /* open the directory iterate through mmaping as we go */ - if ((pmiedir = opendir(PMIE_DIR)) == NULL) { - __pmNotifyErr(LOG_ERR, "pmcd pmda cannot open %s: %s", - PMIE_DIR, strerror(errno)); + if ((pmiedir = opendir(PMIE_DIR)) == NULL) return 0; - } /* NOTE: all valid files are already mmapped by pmie */ while ((dp = readdir(pmiedir)) != NULL) { size = (npmies+1) * sizeof(pmie_t); @@ -470,6 +467,12 @@ refresh_pmie_indom(void) } closedir(pmiedir); } + } else { + if (pmies) { + free(pmies); + pmies = NULL; + } + npmies = 0; } setoserror(0); return npmies; --=-MoG0SngLowdHdON/FxH7-- From nscott@aconex.com Tue Mar 20 19:40:40 2007 Received: with ECARTIS (v1.0.0; list pcp); Tue, 20 Mar 2007 19:40:45 -0700 (PDT) X-Spam-oss-Status: No, score=4.0 required=5.0 tests=BAYES_60 autolearn=no version=3.2.0-pre1-r499012 Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l2L2eb6p022343 for ; Tue, 20 Mar 2007 19:40:39 -0700 Received: from edge (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 6AEBFAAC391; Wed, 21 Mar 2007 13:40:36 +1100 (EST) Subject: [PATCH] pmie log file rotation From: Nathan Scott Reply-To: nscott@aconex.com To: Michael Newton Cc: pcp@oss.sgi.com Content-Type: multipart/mixed; boundary="=-6Y2ToNLhbNj3kdN3X4tr" Organization: Aconex Date: Wed, 21 Mar 2007 13:41:10 +1100 Message-Id: <1174444870.5051.340.camel@edge> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 X-archive-position: 1137 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Content-Length: 37352 Lines: 1334 --=-6Y2ToNLhbNj3kdN3X4tr Content-Type: text/plain Content-Transfer-Encoding: 7bit Hi there, Following is a patch which implements log file rotation for pmie daemons, using a similar mechanism to that used for the archives from pmlogger. This is useful when using multiple pmie actions for all rules, one of which is to write to the local log (every delta, potentially, so unbounded logfile growth becomes an issue) and the other action is to send performance events to an "event clearinghouse" (i.e. one of the many management frameworks). This provides an audit trail in the situation where the events are being sent across an unreliable network, for example, and one needs to know whether a problem was detected at all, or whether the problem report was sent but lost in the ether. The patches provide: - pmie_daily log rotation, with log file culling and compression, which builds on the control file that pmie_check already uses; - ability for pmie to catch SIGHUP in daemon mode and to start writing to a new log file; - a libpcp API extension "__pmRotateLog" which is an extension to the existing "__pmOpenLog" API, for daemons. - some minor fixes for issues in man pages, pmlogger scripts, etc. - two QA scripts to exercise the new functionality. I also noticed that pmlogger_daily uses the compress(1) program as its default compression method; however, that only exists on IRIX. So, I updated that to use bzip2(1) instead, which should exist on all supported PCP platforms (there is a command line option to use a different tool, of course, but we should start with something we know exists, probably...). There is an associated piece in the QA test patches which ensures existing QA tests continue to work for both old and new pmlogger_daily versions. cheers. -- Nathan --=-6Y2ToNLhbNj3kdN3X4tr Content-Disposition: attachment; filename=pmie-log-rotation Content-Type: text/x-patch; name=pmie-log-rotation; charset=UTF-8 Content-Transfer-Encoding: 7bit Index: devpcp/src/pmie/src/pmie.c =================================================================== --- devpcp.orig/src/pmie/src/pmie.c 2007-03-16 17:19:39.709133250 +1100 +++ devpcp/src/pmie/src/pmie.c 2007-03-20 13:45:07.183300750 +1100 @@ -71,6 +71,7 @@ static char *prompt = "pmie> "; static char *intro = "Performance Co-Pilot Inference Engine (pmie), " "Version %s\n\n%s%s"; +static FILE *logfp; static char logfile[MAXPATHLEN+1]; static char perffile[PMIE_PATHSIZE]; /* /var/tmp/ file name */ @@ -396,6 +397,42 @@ sigbye(int sig) static void +remap_stdout_stderr(void) +{ + int i, j; + + fflush(stderr); + fflush(stdout); + setlinebuf(stderr); + setlinebuf(stdout); + i = fileno(stdout); + close(i); + if ((j = dup(fileno(stderr))) != i) + fprintf(stderr, "%s: Warning: failed to link stdout ... " + "dup() returns %d, expected %d (stderr=%d)\n", + pmProgname, j, i, fileno(stderr)); +} + +/*ARGSUSED*/ +static void +sighupproc(int sig) +{ + FILE *fp; + int sts; + + fp = __pmRotateLog(pmProgname, logfile, logfp, &sts); + if (sts != 0) { + fprintf(stderr, "pmie: PID = %d, default host = %s\n\n", + (int)getpid(), dfltHost); + remap_stdout_stderr(); + logfp = fp; + } else { + __pmNotifyErr(LOG_ERR, "pmie: log rotation failed\n"); + } +} + + +static void dotraceback(void) { #if HAVE_TRACE_BACK_STACK @@ -653,7 +690,6 @@ getargs(int argc, char *argv[]) perf = &instrument; if (isdaemon) { /* daemon mode */ - signal(SIGHUP, SIG_IGN); signal(SIGTTOU, SIG_IGN); signal(SIGTTIN, SIG_IGN); signal(SIGTSTP, SIG_IGN); @@ -669,30 +705,22 @@ getargs(int argc, char *argv[]) } if (commandlog != NULL) { - __pmOpenLog(pmProgname, commandlog, stderr, &sts); + logfp = __pmOpenLog(pmProgname, commandlog, stderr, &sts); if (realpath(commandlog, logfile) == NULL) { fprintf(stderr, "%s: cannot find realpath for log %s: %s\n", pmProgname, commandlog, strerror(oserror())); exit(1); } + signal(SIGHUP, isdaemon ? sighupproc : SIG_IGN); + } else { + signal(SIGHUP, SIG_IGN); } - if (bflag) { - /* - * -b ... force line buffering and stdout onto stderr - */ - int i, j; - - fflush(stderr); - fflush(stdout); - setlinebuf(stderr); - setlinebuf(stdout); - i = fileno(stdout); - close(i); - if ((j = dup(fileno(stderr))) != i) - fprintf(stderr, "%s: Warning: failed to link stdout (-b option) " - "... dup() returns %d, expected %d\n", pmProgname, j, i); - } + /* + * -b ... force line buffering and stdout onto stderr + */ + if (bflag || isdaemon) + remap_stdout_stderr(); if (__pmGetLicense(PM_LIC_MON, pmProgname, GET_LICENSE_SHOW_EXP) == PM_LIC_MON || @@ -771,7 +799,9 @@ getargs(int argc, char *argv[]) */ if (isdaemon) { /* daemon mode */ - close(fileno(stdin)); /* ensure stdin closed for daemon */ + /* Note: we can no longer close stdin here, as it can really + * confuse remap_stdout_stderr() during log rotation! + */ setsid(); /* not process group leader, lose controlling tty */ } Index: devpcp/src/pmie/GNUmakefile =================================================================== --- devpcp.orig/src/pmie/GNUmakefile 2007-03-16 17:19:39.769137000 +1100 +++ devpcp/src/pmie/GNUmakefile 2007-03-16 17:19:46.001526500 +1100 @@ -28,7 +28,7 @@ include $(TOPDIR)/src/include/builddefs SUBDIRS = src examples LSRCFILES = control etc_init.d_pmie pmie_check.sh crontab config.default \ - pmie2col stomp.install + pmie2col stomp.install pmie_daily.sh LDIRT = control.install config.default.install CFG_DIR = $(PCP_VAR_DIR)/config/pmie @@ -49,6 +49,7 @@ install:: default $(INSTALL) -m 644 stomp.install $(CFG_DIR)/stomp $(INSTALL) -m 755 etc_init.d_pmie $(PCP_RC_DIR)/pmie $(INSTALL) -m 755 pmie_check.sh $(PCP_BINADM_DIR)/pmie_check + $(INSTALL) -m 755 pmie_daily.sh $(PCP_BINADM_DIR)/pmie_daily $(INSTALL) -m 755 pmie2col $(PCP_BIN_DIR)/pmie2col include $(BUILDRULES) Index: devpcp/src/pmie/pmie_daily.sh =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ devpcp/src/pmie/pmie_daily.sh 2007-03-21 08:49:21.945955250 +1100 @@ -0,0 +1,477 @@ +#! /bin/sh +#Tag 0x00010D13 +# +# Copyright (c) 1995-2000,2003 Silicon Graphics, Inc. All Rights Reserved. +# Portions Copyright (c) 2007 Aconex. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or modify it +# under the terms of the GNU General Public License as published by the +# Free Software Foundation; either version 2 of the License, or (at your +# option) any later version. +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +# or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +# for more details. +# +# You should have received a copy of the GNU General Public License along +# with this program; if not, write to the Free Software Foundation, Inc., +# 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA +# +# Contact information: Silicon Graphics, Inc., 1500 Crittenden Lane, +# Mountain View, CA 94043, USA, or: http://www.sgi.com +# +# Example daily administrative script for pmie logfiles +# + +# Get standard environment +. /etc/pcp.env + +# Get the portable PCP rc script functions +if [ -f $PCP_SHARE_DIR/lib/rc-proc.sh ] ; then + . $PCP_SHARE_DIR/lib/rc-proc.sh +fi + +# added to handle problem when /var/log/pcp is a symlink, as first +# reported by Micah_Altman@harvard.edu in Nov 2001 +# +_unsymlink_path() +{ + [ -z "$1" ] && return + __d=`dirname $1` + __real_d=`cd $__d 2>/dev/null && /bin/pwd` + if [ -z "$__real_d" ] + then + echo $1 + else + echo $__real_d/`basename $1` + fi +} + +# error messages should go to stderr, not the GUI notifiers +# +unset PCP_STDERR + +# constant setup +# +tmp=/tmp/$$ +status=0 +echo >$tmp.lock +trap "rm -f \`[ -f $tmp.lock ] && cat $tmp.lock\` $tmp.*; exit \$status" 0 1 2 3 15 +prog=`basename $0` + +# control file for pmie administration ... edit the entries in this +# file to reflect your local configuration (see also -c option below) +# +CONTROL=$PCP_VAR_DIR/config/pmie/control + +# default number of days to keep pmie logfiles +# +CULLAFTER=14 + +# default compression program +# +COMPRESS=bzip2 +COMPRESSAFTER="" +COMPRESSREGEX=".meta$|.index$|.Z$|.gz$|.bz2$|.zip$" + +# determine real name for localhost +LOCALHOSTNAME=`hostname | sed -e 's/\..*//'` +[ -z "$LOCALHOSTNAME" ] && LOCALHOSTNAME=localhost + +# determine path for pwd command to override shell built-in +# (see BugWorks ID #595416). +PWDCMND=`which pwd 2>/dev/null | $PCP_AWK_PROG ' +BEGIN { i = 0 } +/ not in / { i = 1 } +/ aliased to / { i = 1 } + { if ( i == 0 ) print } +'` +if [ -z "$PWDCMND" ] +then + # Looks like we have no choice here... + # force it to a known IRIX location + PWDCMND=/bin/pwd +fi + +_usage() +{ + cat - <$tmp.lock +} + +# filter for pmie log files in working directory - +# pass in the number of days to skip over (backwards) from today +# +# pv:821339 too many sed commands for IRIX ... split into groups +# of at most 200 days +# +_date_filter() +{ + # start with all files whose names match the patterns used by + # the PCP pmie log file management scripts ... this list may be + # reduced by the sed filtering later on + # + ls | sed -n >$tmp.in -e '/[-.][12][0-9][0-9][0-9][0-1][0-9][0-3][0-9]$/p' + + i=0 + while [ $i -le $1 ] + do + dmax=`expr $i + 200` + [ $dmax -gt $1 ] && dmax=$1 + echo "/[-.][12][0-9][0-9][0-9][0-1][0-9][0-3][0-9]$/{" >$tmp.sed1 + while [ $i -le $dmax ] + do + x=`pmdate -${i}d %Y%m%d` + echo "/[-.]$x\$/d" >>$tmp.sed1 + i=`expr $i + 1` + done + echo "p" >>$tmp.sed1 + echo "}" >>$tmp.sed1 + + # cull file names with matching dates, keep other file names + # + sed -n -f $tmp.sed1 <$tmp.in >$tmp.tmp + mv $tmp.tmp $tmp.in + done + + cat $tmp.in +} + + +rm -f $tmp.err +line=0 +version='' +cat $CONTROL \ +| sed -e "s/LOCALHOSTNAME/$LOCALHOSTNAME/g" \ + -e "s;PCP_LOG_DIR;$PCP_LOG_DIR;g" \ +| while read host socks logfile args +do + logfile=`_unsymlink_path $logfile` + line=`expr $line + 1` + $VERY_VERBOSE && echo "[control:$line] host=\"$host\" socks=\"$socks\" log=\"$logfile\" args=\"$args\"" + case "$host" + in + \#*|'') # comment or empty + continue + ;; + + \$*) # in-line variable assignment + $SHOWME && echo "# $host $socks $logfile $args" + cmd=`echo "$host $socks $logfile $args" \ + | sed -n \ + -e "/='/s/\(='[^']*'\).*/\1/" \ + -e '/="/s/\(="[^"]*"\).*/\1/' \ + -e '/=[^"'"'"']/s/[;&<>|].*$//' \ + -e '/^\\$[A-Za-z][A-Za-z0-9_]*=/{ +s/^\\$// +s/^\([A-Za-z][A-Za-z0-9_]*\)=/export \1; \1=/p +}'` + if [ -z "$cmd" ] + then + # in-line command, not a variable assignment + _warning "in-line command is not a variable assignment, line ignored" + else + case "$cmd" + in + 'export PATH;'*) + _warning "cannot change \$PATH, line ignored" + ;; + 'export IFS;'*) + _warning "cannot change \$IFS, line ignored" + ;; + *) + $SHOWME && echo "+ $cmd" + eval $cmd + ;; + esac + fi + continue + ;; + esac + + if [ -z "$socks" -o -z "$logfile" -o -z "$args" ] + then + _error "insufficient fields in control file record" + continue + fi + + if $VERY_VERBOSE + then + echo "Check pmie -h $host ... in $dir ..." + fi + + dir=`dirname $logfile` + if [ ! -d "$dir" ] + then + _error "logfile directory ($dir) does not exist" + continue + fi + + cd $dir + dir=`$PWDCMND` + $SHOWME && echo "+ cd $dir" + + if $VERBOSE + then + echo + echo "=== daily maintenance of pmie log files for host $host ===" + echo + fi + + if [ ! -w $dir ] + then + echo "$prog: Warning: no write access in $dir, skip lock file processing" + else + # demand mutual exclusion + # + fail=true + rm -f $tmp.stamp + for try in 1 2 3 4 + do + if pmlock -v lock >$tmp.out + then + echo $dir/lock >$tmp.lock + fail=false + break + else + if [ ! -f $tmp.stamp ] + then + touch -t `pmdate -30M %Y%m%d%H%M` $tmp.stamp + fi + if [ ! -z "`find lock -newer $tmp.stamp -print 2>/dev/null`" ] + then + : + else + echo "$prog: Warning: removing lock file older than 30 minutes" + LC_TIME=POSIX ls -l $dir/lock + rm -f lock + fi + fi + sleep 5 + done + + if $fail + then + # failed to gain mutex lock + # + if [ -f lock ] + then + echo "$prog: Warning: is another PCP cron job running concurrently?" + LC_TIME=POSIX ls -l $dir/lock + else + echo "$prog: `cat $tmp.out`" + fi + _warning "failed to acquire exclusive lock ($dir/lock) ..." + continue + fi + fi + + # match $logfile and $fqdn from control file to running pmies + pid="" + fqdn=`pmhostname $host` + for file in `ls $PCP_TMP_DIR/pmie` + do + p_id=$file + file="$PCP_TMP_DIR/pmie/$file" + p_logfile="" + p_pmcd_host="" + + case "$PCP_PLATFORM" + in + irix) + test -f /proc/pinfo/$p_id + ;; + *) + test -e /proc/$p_id + ;; + esac + if [ $? -eq 0 ] + then + eval `tr '\0' '\012' < $file | sed -e '/^$/d' | sed -e 3q | $PCP_AWK_PROG ' +NR == 2 { printf "p_logfile=\"%s\"\n", $0; next } +NR == 3 { printf "p_pmcd_host=\"%s\"\n", $0; next } + { next }'` + p_logfile=`_unsymlink_path $p_logfile` + if [ "$p_logfile" = $logfile -a "$p_pmcd_host" = "$fqdn" ] + then + pid=$p_id + break + fi + else + # ignore, its not a running process + eval $RM -f $file + fi + done + + if [ -z "$pid" ] + then + _error "no pmie instance running for host \"$host\"" + else + if [ "`echo $pid | wc -w`" -gt 1 ] + then + _error "multiple pmie instances running for host \"$host\", processes: $pid" + _unlock + continue + fi + + # now move current logfile name aside and SIGHUP to "roll the logs" + # creating a new logfile with the old name in the process. + # + $SHOWME && echo "+ mv $logfile ${logfile}.{SUMMARY_LOGNAME}" + if mv $logfile ${logfile}.${SUMMARY_LOGNAME} + then + kill -HUP $pid + else + _error "problems moving logfile \"$logfile\" for host \"$host\"" + touch $tmp.err + fi + fi + + # and cull old logfiles + # + if [ X"$CULLAFTER" != X"forever" ] + then + _date_filter $CULLAFTER >$tmp.list + if [ -s $tmp.list ] + then + if $VERBOSE + then + echo "Log files older than $CULLAFTER days being removed ..." + fmt <$tmp.list | sed -e 's/^/ /' + fi + if $SHOWME + then + cat $tmp.list | xargs echo + rm -f + else + cat $tmp.list | xargs rm -f + fi + fi + fi + + # finally, compress old log files + # (after cull - don't compress unnecessarily) + # + if [ ! -z "$COMPRESSAFTER" ] + then + _date_filter $COMPRESSAFTER | egrep -v "$COMPRESSREGEX" >$tmp.list + if [ -s $tmp.list ] + then + if $VERBOSE + then + echo "Log files older than $COMPRESSAFTER days being compressed ..." + fmt <$tmp.list | sed -e 's/^/ /' + fi + if $SHOWME + then + cat $tmp.list | xargs echo + $COMPRESS + else + cat $tmp.list | xargs $COMPRESS + fi + fi + fi + + _unlock + +done + +[ -f $tmp.err ] && status=1 +exit Index: devpcp/src/pmlogctl/pmlogger_daily.sh =================================================================== --- devpcp.orig/src/pmlogctl/pmlogger_daily.sh 2007-03-16 17:19:39.829140750 +1100 +++ devpcp/src/pmlogctl/pmlogger_daily.sh 2007-03-20 16:01:05.949191750 +1100 @@ -58,9 +58,9 @@ CULLAFTER=14 # default compression program # -COMPRESS=compress +COMPRESS=bzip2 COMPRESSAFTER="" -COMPRESSREGEX=".meta$|.index$|.Z$|.gz$" +COMPRESSREGEX=".meta$|.index$|.Z$|.gz$|.bz2$|.zip$" # threshold size to roll $PCP_LOG_DIR/NOTICES # @@ -417,7 +417,7 @@ s/^\([A-Za-z][A-Za-z0-9_]*\)=/export \1; then pflag='' [ $primary = y ] && pflag=' -P' - echo "Check pmlogger$flag -h $host ... in $dir ..." + echo "Check pmlogger$pflag -h $host ... in $dir ..." fi if [ ! -d $dir ] Index: devpcp/src/pmie/pmie_check.sh =================================================================== --- devpcp.orig/src/pmie/pmie_check.sh 2007-03-16 17:19:39.773137250 +1100 +++ devpcp/src/pmie/pmie_check.sh 2007-03-20 16:01:05.881187500 +1100 @@ -72,7 +72,7 @@ prog=`basename $0` CONTROL=$PCP_VAR_DIR/config/pmie/control # determine real name for localhost -LOCALHOSTNAME=`hostname` +LOCALHOSTNAME=`hostname | sed -e 's/\..*//'` if [ -z "$LOCALHOSTNAME" ] then echo "$prog: Error: cannot determine hostname, giving up" @@ -152,7 +152,7 @@ _message() case $1 in 'restart') - echo -n "Restarting pmie for host \"$host\" ..." + $PCP_ECHO_PROG $PCP_ECHO_N "Restarting pmie for host \"$host\" ..." ;; esac } Index: devpcp/man/man1/pmlogger_daily.1 =================================================================== --- devpcp.orig/man/man1/pmlogger_daily.1 2007-03-16 17:19:39.913146000 +1100 +++ devpcp/man/man1/pmlogger_daily.1 2007-03-20 15:31:50.227466000 +1100 @@ -100,7 +100,7 @@ option specifies the number of days afte files, and the .B \-X option specifies the program to use for compression \- by default this is -.BR compress (1). +.BR bzip2 (1). Use of the .B \-Y option allows a regular expression to be specified causing files in @@ -108,11 +108,10 @@ the set of files matched for compression only the data file to be compressed, and also prevents the program from attempting to compress it more than once. The default .I regex -is ".meta$|.index$|.Z$|.gz$" \- such files are filtered using the +is ".meta$|.index$|.Z$|.gz$|.bz2|.zip$" \- such files are filtered using the .B \-v option to .BR egrep (1). -.B .PP In addition, if the PCP ``notices'' file (\c @@ -341,9 +340,9 @@ for root if automated PCP archive log ma .nf .ft CW # daily processing of archive logs -10 0 * * * $PCP_BINADM_DIR/pmlogger_daily +14 0 * * * $PCP_BINADM_DIR/pmlogger_daily # every 30 minutes, check pmlogger instances are running -25,55 * * * * $PCP_BINADM_DIR/pmlogger_check +28,58 * * * * $PCP_BINADM_DIR/pmlogger_check .ft 1 .fi .PP @@ -444,11 +443,11 @@ other configuration files suited for particular PCP monitoring tools, add-on products and application environments .TP -.BI $PCP_LOG_DIR/pmlogger hostname +.BI $PCP_LOG_DIR/pmlogger/ hostname default location for archives of performance information collected from the host .I hostname .TP -.BI $PCP_LOG_DIR/pmlogger hostname /lock +.BI $PCP_LOG_DIR/pmlogger/ hostname /lock transient lock file to guarantee mutual exclusion during .B pmlogger administration for the host @@ -459,7 +458,7 @@ nor .B pmlogger_check are running .TP -.BI $PCP_LOG_DIR/pmlogger hostname /Latest +.BI $PCP_LOG_DIR/pmlogger/ hostname /Latest PCP archive folio created by .BR mkaf (1) for the most recently launched archive containing performance metrics from @@ -485,7 +484,7 @@ configuration file, as described in .BR pcp.conf (4). .SH SEE ALSO -.BR compress (1), +.BR bzip2 (1), .BR cron (1), .BR egrep (1), .BR PCP (1), Index: devpcp/man/man1/pmie_check.1 =================================================================== --- devpcp.orig/man/man1/pmie_check.1 2007-03-16 17:19:39.969149500 +1100 +++ devpcp/man/man1/pmie_check.1 2007-03-20 15:32:02.180213000 +1100 @@ -33,20 +33,76 @@ .rr X \} .SH NAME -\f3pmie_check\f1 -\- administration of the Performance Co-Pilot inference engine +\f3pmie_check\f1, +\f3pmie_daily\f1 \- administration of the Performance Co-Pilot inference engine .SH SYNOPSIS .B $PCP_BINADM_DIR/pmie_check [\f3\-NsV\f1] [\f3\-c\f1 \f2control\f1] +.br +.B $PCP_BINADM_DIR/pmie_daily +[\f3\-NV\f1] +[\f3\-c\f1 \f2control\f1] +[\f3\-k\f1 \f2discard\f1] +[\f3\-x\f1 \f2compress\f1] +[\f3\-X\f1 \f2program\f1] +[\f3\-Y\f1 \f2regex\f1] +.br .SH DESCRIPTION -This shell script and associated control file may be used to +This series of shell scripts and associated control files may be used to create a customized regime of administration and management for the Performance Co-Pilot (see .BR PCPintro (1)) inference engine, .BR pmie (1). .PP +.B pmie_daily +is intended to be run once per day, preferably in the early morning, as +soon after midnight as practicable. Its task is to rotate the log files +for the running +.B pmie +processes \- these files may grow without bound if the +``print'' action is used, or any other +.B pme +action writes to its stdout/stderr streams. +After some period, old +.B pmie +log files are discarded. +This period is 14 days by default, but may be changed using the +.B \-k +option. Two special values are recognized for the period (\c +.IR discard ), +namely +.B 0 +to keep no log files beyond the current one, and +.B forever +to prevent any log files being discarded. +.PP +Log files can optionally be compressed after some period (\c +.IR compress ), +to conserve disk space. This is particularly useful for large numbers of +.B pmie +processes under the control of +.BR pmie_check . +The +.B \-x +option specifies the number of days after which to compress archive data +files, and the +.B \-X +option specifies the program to use for compression \- by default this is +.BR bzip2 (1). +Use of the +.B \-Y +option allows a regular expression to be specified causing files in +the set of files matched for compression to be omitted \- this allows +only the data file to be compressed, and also prevents the program from +attempting to compress it more than once. The default +.I regex +is ".meta$|.index$|.Z$|.gz$|.bz2|.zip$" \- such files are filtered using the +.B \-v +option to +.BR egrep (1). +.PP .B pmie_check may be run at any time, and is intended to check that the desired set of @@ -58,8 +114,11 @@ option provides the reverse functionalit .B pmie processes to be cleanly shutdown. .PP +Both .B pmie_check -is controlled by a PCP inference engine control file that specifies the +and +.B pmie_daily +are controlled by a PCP inference engine control file that specifies the .B pmie instances to be managed. The default control file is .B $PCP_VAR_DIR/config/pmie/control @@ -184,8 +243,8 @@ and another monitoring performance metri .PP .nf .ft CW -wobbly n $PCP_LOG_DIR/pmie/wobbly -c $PCP_VAR_DIR/config/pmie/config.default -splat n $PCP_LOG_DIR/pmie/splat -c $PCP_LOG_DIR/pmie/splat/cpu.conf +wobbly n PCP_LOG_DIR/pmie/wobbly -c pmie/config.default +splat n PCP_LOG_DIR/pmie/splat -c pmie/splat/cpu.conf .ft 1 .fi .PP @@ -199,8 +258,10 @@ and shown below. .PP .nf .ft CW +# daily processing of pmie logs +14 0 * * * $PCP_BINADM_DIR/pmie_daily # every 30 minutes, check pmie instances are running -25,55 * * * * $PCP_BINADM_DIR/pmie_check +28,58 * * * * $PCP_BINADM_DIR/pmie_check .ft 1 .fi .PP @@ -275,15 +336,20 @@ other than root. .B $PCP_VAR_DIR/config/pmie/crontab sample crontab for automated script execution by root .TP -.BI logfile .lock -transient lock file which is named using the control-specified +.BI $PCP_LOG_DIR/pmie/ hostname +default location for the pmie log file for the host +.I hostname +.TP +.BI $PCP_LOG_DIR/pmie/ hostname /lock +transient lock file to guarantee mutual exclusion during .B pmie -.I logfile -names, and is used to guarantee mutual exclusion during -.B pmie_check -execution \- if present, can be safely removed if +administration for the host +.I hostname +\- if present, can be safely removed if neither +.B pmie_daily +nor .B pmie_check -is not running +are running .TP .B $PCP_LOG_DIR/NOTICES PCP ``notices'' file used by Index: devpcp/src/include/impl.h =================================================================== --- devpcp.orig/src/include/impl.h 2007-03-16 17:19:39.881144000 +1100 +++ devpcp/src/include/impl.h 2007-03-16 17:19:46.045529250 +1100 @@ -202,6 +202,7 @@ extern int __pmHasPMNSFileChanged(const /* standard log file set up */ extern FILE *__pmOpenLog(const char *, const char *, FILE *, int *); +extern FILE *__pmRotateLog(const char *, const char *, FILE *, int *); /* make __pmNotifyErr also add entries to syslog */ extern void __pmSyslog(int); /* standard error, warning and info wrapper for syslog(3C) */ Index: devpcp/src/libpcp/src/util.c =================================================================== --- devpcp.orig/src/libpcp/src/util.c 2007-03-16 17:19:39.837141250 +1100 +++ devpcp/src/libpcp/src/util.c 2007-03-19 15:51:15.064601250 +1100 @@ -57,32 +57,6 @@ char *pmProgname = "pcp"; /* the real static int vpmprintf(const char *, va_list); -static void -onexit(void) -{ - int i; - time_t now; - - /* - * there is a race condition here ... but the worse that can happen - * is (a) no "Log finished" message, or (b) _two_ "Log finished" - * messages ... neither case is serious enough to warrant a mutex guard - */ - if (++done_exit != 1) - return; -#if defined(IRIX5_3) - if (lucky_pid != getpid()) { - done_exit--; - return; - } -#endif - - (void)time(&now); - for (i = 0; i < nfilelog; i++) { - fprintf(filelog[i], "\nLog finished %s", ctime(&now)); - } -} - /* * if onoff == 1, logging is to syslog and stderr, else logging is * just to stderr (this is the default) @@ -167,12 +141,56 @@ __pmNotifyErr(int priority, const char * pmflush(); } -FILE * -__pmOpenLog(const char *progname, const char *logname, FILE *oldstream, - int *status) +static void +logheader(const char *progname, FILE *log, const char *act) { time_t now; char host[MAXHOSTNAMELEN]; + + setlinebuf(log); /* line buffering for log files */ + gethostname(host, MAXHOSTNAMELEN); + host[MAXHOSTNAMELEN-1] = '\0'; + time(&now); + fprintf(log, "Log for %s on %s %s %s\n", progname, host, act, ctime(&now)); +} + +static void +logfooter(FILE *log, const char *act) +{ + time_t now; + + time(&now); + fprintf(log, "\nLog %s %s", act, ctime(&now)); +} + +static void +logonexit(void) +{ + int i; + + /* + * there is a race condition here ... but the worse that can happen + * is (a) no "Log finished" message, or (b) _two_ "Log finished" + * messages ... neither case is serious enough to warrant a mutex guard + */ + if (++done_exit != 1) + return; +#if defined(IRIX5_3) + if (lucky_pid != getpid()) { + done_exit--; + return; + } +#endif + + for (i = 0; i < nfilelog; i++) + logfooter(filelog[i], "finished"); +} + +/* common code shared by __pmRotateLog and __pmOpenLog */ +static FILE * +logreopen(const char *progname, const char *logname, FILE *oldstream, + int *status) +{ int oldfd; int dupoldfd; FILE *dupoldstream = oldstream; @@ -223,13 +241,15 @@ __pmOpenLog(const char *progname, const *status = 1; } close(dupoldfd); - setlinebuf(oldstream); /* line buffering for log files */ - (void)gethostname(host, MAXHOSTNAMELEN); - host[MAXHOSTNAMELEN-1] = '\0'; - time(&now); - fprintf(oldstream, "Log for %s on %s started %s\n", - progname, host, ctime(&now)); + return oldstream; +} +FILE * +__pmOpenLog(const char *progname, const char *logname, FILE *oldstream, + int *status) +{ + oldstream = logreopen(progname, logname, oldstream, status); + logheader(progname, oldstream, "started"); /* * atexit() race condition in IRIX 5.3 measn we have to be very careful @@ -237,18 +257,36 @@ __pmOpenLog(const char *progname, const */ nfilelog++; if (nfilelog == 1) { - atexit(onexit); + atexit(logonexit); #if defined(IRIX5_3) lucky_pid = getpid(); #endif } - if ((filelog = (FILE **)realloc(filelog, nfilelog * sizeof(FILE *))) == NULL) { + filelog = (FILE **)realloc(filelog, nfilelog * sizeof(FILE *)); + if (filelog == NULL) { __pmNoMem("__pmOpenLog", nfilelog * sizeof(FILE *), PM_FATAL_ERR); /*NOTREACHED*/ } filelog[nfilelog-1] = oldstream; + return oldstream; +} +FILE * +__pmRotateLog(const char *progname, const char *logname, FILE *oldstream, + int *status) +{ + int i; + + for (i = 0; i < nfilelog; i++) { + if (oldstream == filelog[i]) { + logfooter(oldstream, "rotated"); /* old */ + oldstream = logreopen(progname, logname, oldstream, status); + logheader(progname, oldstream, "rotated"); /* new */ + filelog[i] = oldstream; + break; + } + } return oldstream; } Index: devpcp/man/man3/pmopenlog.3 =================================================================== --- devpcp.orig/man/man3/pmopenlog.3 2007-03-16 17:32:06.783822500 +1100 +++ devpcp/man/man3/pmopenlog.3 2007-03-16 17:32:30.753320500 +1100 @@ -60,11 +60,11 @@ event of an error, this will be .I oldstream unchanged and .I status -will be 1. +will be 0. .PP For success, .I status -is 0, a standard preamble is written to +is 1, a standard preamble is written to .I logname .ti +0.5i .ft B --=-6Y2ToNLhbNj3kdN3X4tr Content-Disposition: attachment; filename=314-pmie_daily Content-Type: text/x-patch; name=314-pmie_daily; charset=UTF-8 Content-Transfer-Encoding: 7bit Index: qa/314 =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ qa/314 2007-03-21 08:51:42.478738000 +1100 @@ -0,0 +1,68 @@ +#! /bin/sh +# PCP QA Test No. 314 +# Exercise pmie_daily functionality - log rotation +# +# Copyright (c) 2007 Aconex. All Rights Reserved. +# +# creator +owner=nathans + +seq=`basename $0` +echo "QA output created by $seq" + +# get standard filters +. ./common.filter +. ./common.check +. ./localconfig + +tmp=/tmp/$$ +here=`pwd` +sudo=$here/sudo +status=1 # failure is the default! +trap "rm -fr $tmp.* /tmp/$seq; exit \$status" 0 1 2 3 15 + +# create a pmie config file, causing frequent output (to log) +cat > $tmp.config << EOF1 +delta = 0.2 seconds; +fetched = simple.numfetch; +EOF1 + +# create pmie control files and test out various good/bad conditions + +cat > $tmp.control << EOF2 +\$version=1.0 +LOCALHOSTNAME n /tmp/$seq/1.good.log -v -c $tmp.config +EOF2 + +# real QA test starts here +$sudo killall -TERM pmie 2>/dev/null +rm -fr /tmp/$seq && mkdir /tmp/$seq || exit 1 +pmstore simple.numfetch 0 >/dev/null + +# fire em all up +echo "Starting pmie process" +pmie_check -c $tmp.control +sleep 2 # fill original log a bit + +echo "Rotate, rotate..." +previous=`pmdate -1d %Y%m%d` +pmie_daily -c $tmp.control +sleep 2 # fill rotated log a bit + +grep rotated /tmp/$seq/1.good.log >/dev/null \ + || echo "First log not rotated?" +grep rotated /tmp/$seq/1.good.log.$previous >/dev/null \ + || echo "New log not started?" + +echo "Shutdown pmie process" +pmie_check -c $tmp.control -s + +# look for data in each log file, checking rotation actually did something +oldlines=`wc -l < /tmp/$seq/1.good.log.$previous 2>/dev/null || echo 0` +newlines=`wc -l < /tmp/$seq/1.good.log 2>/dev/null || echo 0` +_within_tolerance "Old logfile line count" "$oldlines" 70 %20 -v +_within_tolerance "New logfile line count" "$newlines" 30 %20 -v + +# success, all done +status=0 +exit Index: qa/314.out =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ qa/314.out 2007-03-20 13:26:58.567266500 +1100 @@ -0,0 +1,6 @@ +QA output created by 314 +Starting pmie process +Rotate, rotate... +Shutdown pmie process +Old logfile line count is in range +New logfile line count is in range --=-6Y2ToNLhbNj3kdN3X4tr Content-Disposition: attachment; filename=315-pmie_daily Content-Type: text/x-patch; name=315-pmie_daily; charset=UTF-8 Content-Transfer-Encoding: 7bit Index: qa/315 =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ qa/315 2007-03-21 08:51:06.452486500 +1100 @@ -0,0 +1,66 @@ +#! /bin/sh +# PCP QA Test No. 315 +# exercise pmie log compression thru pmie_daily +# +# Copyright (c) 1995-2002 Silicon Graphics, Inc. All Rights Reserved. +# Portions Copyright (c) 2007 Aconex. All Rights Reserved. +# +# creator +owner=nathans + +seq=`basename $0` +echo "QA output created by $seq" + +# get standard environment, filters and checks +. ./common.filter + +tmp=/tmp/$$ +sudo=`pwd`/sudo + +_cleanup() +{ + [ -d $tmp.distdir ] && rm -fr $tmp.distdir + [ -d $tmp.relaydir ] && rm -fr $tmp.relaydir + rm -f $tmp.* +} + +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +# create test control file, directories and populate with dummy logfiles +cat >$tmp.ctl<; Tue, 20 Mar 2007 22:29:19 -0700 Received: from edge (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 894AAAAC2BD; Wed, 21 Mar 2007 16:29:14 +1100 (EST) Subject: [PATCH] fix filesys.full metric From: Nathan Scott Reply-To: nscott@aconex.com To: Michael Newton Cc: pcp@oss.sgi.com Content-Type: multipart/mixed; boundary="=-zRgFTMvfGhwygvc7bPx4" Organization: Aconex Date: Wed, 21 Mar 2007 16:29:49 +1100 Message-Id: <1174454989.5051.374.camel@edge> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 X-archive-position: 1139 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Content-Length: 5391 Lines: 142 --=-zRgFTMvfGhwygvc7bPx4 Content-Type: text/plain Content-Transfer-Encoding: 7bit Hi, Ken noticed a filesystem fill up but not trigger one of the rules we had monitoring it. It turns out that the filesys.free (%) value is not taking into account the difference in the statfs(2) f_bavail and f_bfree fields. It also turns out that XFS always reports these two values as the same (doesn't allow extra space for root), which is probably why no SGI customer has ever reported it. ;-) I've looked into the way the GNU df(1) source implements this code, and have updated the PCP code to use a similar algorithm. I've also cleaned up some of the related filesys metric value calculations, as the casting was causing me some code readability issues. cheers. -- Nathan --=-zRgFTMvfGhwygvc7bPx4 Content-Disposition: attachment; filename=fix-filesys-full-metric Content-Type: text/x-patch; name=fix-filesys-full-metric; charset=UTF-8 Content-Transfer-Encoding: 7bit Index: devpcp/src/pmdas/linux/pmda.c =================================================================== --- devpcp.orig/src/pmdas/linux/pmda.c 2007-03-21 16:05:37.633833500 +1100 +++ devpcp/src/pmdas/linux/pmda.c 2007-03-21 16:05:50.338627500 +1100 @@ -3523,14 +3523,19 @@ linux_fetchCallBack(pmdaMetric *mdesc, u } switch (idp->item) { + __uint64_t ull, used; + case 1: /* filesys.capacity */ - atom->ull = ((__uint64_t)sbuf->f_blocks) * sbuf->f_bsize / 1024; + ull = (__uint64_t)sbuf->f_blocks; + atom->ull = ull * sbuf->f_bsize / 1024; break; case 2: /* filesys.used */ - atom->ull = ((__uint64_t)(sbuf->f_blocks - sbuf->f_bfree)) * sbuf->f_bsize / 1024; + used = (__uint64_t)(sbuf->f_blocks - sbuf->f_bfree); + atom->ull = used * sbuf->f_bsize / 1024; break; case 3: /* filesys.free */ - atom->ull = ((__uint64_t)(sbuf->f_bfree)) * sbuf->f_bsize / 1024; + ull = (__uint64_t)sbuf->f_bfree; + atom->ull = ull * sbuf->f_bsize / 1024; break; case 4: /* filesys.maxfiles */ atom->ul = sbuf->f_files; @@ -3545,14 +3550,17 @@ linux_fetchCallBack(pmdaMetric *mdesc, u atom->cp = filesys.mounts[i].path; break; case 8: /* filesys.full */ - atom->d = 100.0 - 100.0 * (double)sbuf->f_bfree / (double)sbuf->f_blocks; + used = (__uint64_t)(sbuf->f_blocks - sbuf->f_bfree); + ull = used + (__uint64_t)sbuf->f_bavail; + atom->d = (100.0 * (double)used) / (double)ull; + break; + case 9: /* filesys.blocksize -- added by Mike Mason */ + atom->ul = sbuf->f_bsize; + break; + case 10: /* filesys.avail -- added by Mike Mason */ + ull = (__uint64_t)sbuf->f_bavail; + atom->ull = ull * sbuf->f_bsize / 1024; break; - case 9: /* filesys.blocksize -- added by Mike Mason */ - atom->ul = sbuf->f_bsize; - break; - case 10: /* filesys.avail -- added by Mike Mason */ - atom->ull = ((__uint64_t)(sbuf->f_bavail)) * sbuf->f_bsize / 1024; - break; default: return PM_ERR_PMID; } Index: devpcp/src/pmdas/darwin/pmda.c =================================================================== --- devpcp.orig/src/pmdas/darwin/pmda.c 2007-03-21 16:09:25.524075750 +1100 +++ devpcp/src/pmdas/darwin/pmda.c 2007-03-21 16:13:47.764464750 +1100 @@ -830,6 +830,8 @@ fetch_uname(unsigned int item, pmAtomVal static inline int fetch_filesys(unsigned int item, unsigned int inst, pmAtomValue *atom) { + __uint64_t ull, used; + if (mach_fs_error) return mach_fs_error; if (item == 31) { /* hinv.nfilesys */ @@ -842,16 +844,16 @@ fetch_filesys(unsigned int item, unsigne return PM_ERR_INST; switch (item) { case 32: /* filesys.capacity */ - atom->ull = ((__uint64_t)mach_fs[inst].f_blocks) * - mach_fs[inst].f_bsize >> 10; + ull = (__uint64_t)mach_fs[inst].f_blocks; + atom->ull = ull * mach_fs[inst].f_bsize >> 10; return 1; case 33: /* filesys.used */ - atom->ull = ((__uint64_t)(mach_fs[inst].f_blocks - - mach_fs[inst].f_bfree)) * mach_fs[inst].f_bsize >> 10; + used = (__uint64_t)(mach_fs[inst].f_blocks - mach_fs[inst].f_bfree); + atom->ull = used * mach_fs[inst].f_bsize >> 10; return 1; case 34: /* filesys.free */ - atom->ull = ((__uint64_t)(mach_fs[inst].f_bfree)) * - mach_fs[inst].f_bsize >> 10; + ull = (__uint64_t)mach_fs[inst].f_bfree; + atom->ull = ull * mach_fs[inst].f_bsize >> 10; return 1; case 35: /* filesys.usedfiles */ atom->ul = mach_fs[inst].f_files; @@ -863,16 +865,16 @@ fetch_filesys(unsigned int item, unsigne atom->cp = mach_fs[inst].f_mntonname; return 1; case 38: /* filesys.full */ - atom->d = (!mach_fs[inst].f_blocks) ? 0 : - (100.0 - (100.0 * (double)mach_fs[inst].f_bfree / - (double)mach_fs[inst].f_blocks)); + used = (__uint64_t)(mach_fs[inst].f_blocks - mach_fs[inst].f_bfree); + ull = used + (__uint64_t)mach_fs[inst].f_bavail; + atom->d = (100.0 * (double)used) / (double)ull; return 1; case 39: /* filesys.blocksize */ atom->ul = mach_fs[inst].f_bsize; return 1; case 40: /* filesys.avail */ - atom->ull = ((__uint64_t)mach_fs[inst].f_bavail) * - mach_fs[inst].f_bsize >> 10; + ull = (__uint64_t)mach_fs[inst].f_bavail; + atom->ull = ull * mach_fs[inst].f_bsize >> 10; return 1; case 41: /* filesys.type */ atom->cp = mach_fs[inst].f_fstypename; --=-zRgFTMvfGhwygvc7bPx4-- From nscott@aconex.com Wed Mar 21 16:25:49 2007 Received: with ECARTIS (v1.0.0; list pcp); Wed, 21 Mar 2007 16:25:57 -0700 (PDT) X-Spam-oss-Status: No, score=0.0 required=5.0 tests=BAYES_50 autolearn=no version=3.2.0-pre1-r499012 Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l2LNPl6p018398 for ; Wed, 21 Mar 2007 16:25:49 -0700 Received: from edge (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 45700AAC262; Thu, 22 Mar 2007 10:25:45 +1100 (EST) Subject: [PATCH] fix Linux kernel.all.pswitch From: Nathan Scott Reply-To: nscott@aconex.com To: Michael Newton Cc: pcp@oss.sgi.com Content-Type: multipart/mixed; boundary="=-XASv/WkOVmj/abMX/nUl" Organization: Aconex Date: Thu, 22 Mar 2007 10:26:25 +1100 Message-Id: <1174519585.5051.406.camel@edge> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 X-archive-position: 1150 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Content-Length: 1086 Lines: 37 --=-XASv/WkOVmj/abMX/nUl Content-Type: text/plain Content-Transfer-Encoding: 7bit Hi, This fixes a cut & paste error in the Linux PMDA, where it is exporting the "kernel.all.intr" value as the number of context switches also. cheers. -- Nathan --=-XASv/WkOVmj/abMX/nUl Content-Disposition: attachment; filename=fix-linux-pswitch Content-Type: text/x-patch; name=fix-linux-pswitch; charset=UTF-8 Content-Transfer-Encoding: 7bit Index: devpcp/src/pmdas/linux/pmda.c =================================================================== --- devpcp.orig/src/pmdas/linux/pmda.c 2007-03-22 10:11:43.558449250 +1100 +++ devpcp/src/pmdas/linux/pmda.c 2007-03-22 10:12:03.475694000 +1100 @@ -3129,7 +3129,7 @@ linux_fetchCallBack(pmdaMetric *mdesc, u _pm_assign_utype(_pm_intr_size, atom, proc_stat.intr); break; case 13: /* ctxt */ - _pm_assign_utype(_pm_ctxt_size, atom, proc_stat.intr); + _pm_assign_utype(_pm_ctxt_size, atom, proc_stat.ctxt); break; case 14: /* processes */ _pm_assign_ulong(atom, proc_stat.processes); --=-XASv/WkOVmj/abMX/nUl-- From nscott@aconex.com Wed Mar 21 17:35:06 2007 Received: with ECARTIS (v1.0.0; list pcp); Wed, 21 Mar 2007 17:35:10 -0700 (PDT) X-Spam-oss-Status: No, score=0.1 required=5.0 tests=BAYES_50,J_CHICKENPOX_93 autolearn=no version=3.2.0-pre1-r499012 Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l2M0Z36p000456 for ; Wed, 21 Mar 2007 17:35:05 -0700 Received: from edge (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 4336AAAC354; Thu, 22 Mar 2007 11:35:02 +1100 (EST) Subject: Possibly quirky libpcp_pmda behavior? From: Nathan Scott Reply-To: nscott@aconex.com To: dchatterton@aconex.com Cc: pcp@oss.sgi.com Content-Type: multipart/mixed; boundary="=-LactvBahrFAR4SBRRv/V" Organization: Aconex Date: Thu, 22 Mar 2007 11:35:43 +1100 Message-Id: <1174523743.5051.439.camel@edge> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 X-archive-position: 1151 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Content-Length: 2742 Lines: 88 --=-LactvBahrFAR4SBRRv/V Content-Type: text/plain Content-Transfer-Encoding: 7bit Hi Dave, (IIRC, you wrote this bit of code, so maybe you'll remember). I have an agent which is returning PM_ERR_APPVERSION for some of its metric values, for metrics not supported on a particular platform version. >From pmapi.h: #define PM_ERR_APPVERSION (-PM_ERR_BASE-5) /* Metric not supported by this version of monitored application */ And the code in question in libpcp_pmda is here (callback.c)... int pmdaFetch(int numpmid, pmID pmidlist[], pmResult **resp, pmdaExt *pmda) { ... if ((sts = (*(pmda->e_fetchCallBack))(metap, inst, &atom)) < 0) { if (sts == PM_ERR_PMID) __pmNotifyErr(LOG_ERR, "pmdaFetch: PMID %s not handled by fetch callback\n", pmIDStr(dp->pmid)); else if (sts == PM_ERR_INST) { #ifdef PCP_DEBUG if (pmDebug & DBG_TRACE_LIBPMDA) { __pmNotifyErr(LOG_ERR, "pmdaFetch: Instance %d of PMID %s not handled by fetch callback\n", inst, pmIDStr(dp->pmid)); } #endif } else __pmNotifyErr(LOG_ERR, "pmdaFetch: Fetch callback error: %s\n", pmErrStr(sts)); } ... ... if (sts >= 0) { vset->valfmt = sts; j++; } The problem I'm facing is that I end up getting one line in the pmda's logfile everytime a fetch is done for an instance tree for every metric that isnt supported for that platform version. In some situations this ends up being frequent, and I end up with a huge log file after a few days/weeks, and none of the messages are helpful as its expected. So, the pmResult structure is filled in correctly, and the clients get the correct error for those metrics, but it seems either the library is being too chatty (at least for this particular errno), or I should be coding the fetch callback differently. Thoughts? We could resolve my immediate problem using the attached patch, but maybe I'm not approaching this the right way (maybe the fetch method itself should do this?) - any insight would be much appreciated! thanks. -- Nathan --=-LactvBahrFAR4SBRRv/V Content-Disposition: attachment; filename=quiet-libpcp_pmda-appversion Content-Type: text/x-patch; name=quiet-libpcp_pmda-appversion; charset=UTF-8 Content-Transfer-Encoding: 7bit Index: devpcp/src/libpcp_pmda/src/callback.c =================================================================== --- devpcp.orig/src/libpcp_pmda/src/callback.c 2007-03-22 11:28:08.464988000 +1100 +++ devpcp/src/libpcp_pmda/src/callback.c 2007-03-22 11:28:59.028148000 +1100 @@ -534,7 +534,7 @@ pmdaFetch(int numpmid, pmID pmidlist[], } #endif } - else + else if (sts != PM_ERR_APPVERSION) __pmNotifyErr(LOG_ERR, "pmdaFetch: Fetch callback error: %s\n", pmErrStr(sts)); --=-LactvBahrFAR4SBRRv/V-- From dchatterton@aconex.com Wed Mar 21 18:44:29 2007 Received: with ECARTIS (v1.0.0; list pcp); Wed, 21 Mar 2007 18:44:34 -0700 (PDT) X-Spam-oss-Status: No, score=0.1 required=5.0 tests=BAYES_50,J_CHICKENPOX_93 autolearn=no version=3.2.0-pre1-r499012 Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l2M1iR6p013498 for ; Wed, 21 Mar 2007 18:44:29 -0700 Received: from DCHATTERTONLAPTOP (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 4A1B0AACCE5; Thu, 22 Mar 2007 12:24:35 +1100 (EST) From: "David Chatterton" To: Cc: Subject: RE: Possibly quirky libpcp_pmda behavior? Date: Thu, 22 Mar 2007 12:24:42 +1100 Message-ID: <002801c76c20$de5f76a0$4305a8c0@DCHATTERTONLAPTOP> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <1174523743.5051.439.camel@edge> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 Thread-Index: AcdsGbM2k32LYMqNS3u/Kyr2orTqxwABp1/A X-archive-position: 1152 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: dchatterton@aconex.com Precedence: bulk X-list: pcp Content-Length: 2502 Lines: 85 How about this? That way you can still trap these if you are debugging an agent. else if (sts != PM_ERR_APPVERSION) { __pmNotifyErr(LOG_ERR, "pmdaFetch: Fetch callback error: %s\n", pmErrStr(sts)); } else if (pmDebug & DBG_TRACE_LIBPMDA) { __pmNotifyErr(LOG_ERR, "pmdaFetch: Unsupported metric %s\n", pmIDStr(dp->pmid)); } David -----Original Message----- From: Nathan Scott [mailto:nscott@aconex.com] Sent: Thursday, 22 March 2007 11:36 AM To: dchatterton@aconex.com Cc: pcp@oss.sgi.com Subject: Possibly quirky libpcp_pmda behavior? Hi Dave, (IIRC, you wrote this bit of code, so maybe you'll remember). I have an agent which is returning PM_ERR_APPVERSION for some of its metric values, for metrics not supported on a particular platform version. >From pmapi.h: #define PM_ERR_APPVERSION (-PM_ERR_BASE-5) /* Metric not supported by this version of monitored application */ And the code in question in libpcp_pmda is here (callback.c)... int pmdaFetch(int numpmid, pmID pmidlist[], pmResult **resp, pmdaExt *pmda) { ... if ((sts = (*(pmda->e_fetchCallBack))(metap, inst, &atom)) < 0) { if (sts == PM_ERR_PMID) __pmNotifyErr(LOG_ERR, "pmdaFetch: PMID %s not handled by fetch callback\n", pmIDStr(dp->pmid)); else if (sts == PM_ERR_INST) { #ifdef PCP_DEBUG if (pmDebug & DBG_TRACE_LIBPMDA) { __pmNotifyErr(LOG_ERR, "pmdaFetch: Instance %d of PMID %s not handled by fetch callback\n", inst, pmIDStr(dp->pmid)); } #endif } else __pmNotifyErr(LOG_ERR, "pmdaFetch: Fetch callback error: %s\n", pmErrStr(sts)); } ... ... if (sts >= 0) { vset->valfmt = sts; j++; } The problem I'm facing is that I end up getting one line in the pmda's logfile everytime a fetch is done for an instance tree for every metric that isnt supported for that platform version. In some situations this ends up being frequent, and I end up with a huge log file after a few days/weeks, and none of the messages are helpful as its expected. So, the pmResult structure is filled in correctly, and the clients get the correct error for those metrics, but it seems either the library is being too chatty (at least for this particular errno), or I should be coding the fetch callback differently. Thoughts? We could resolve my immediate problem using the attached patch, but maybe I'm not approaching this the right way (maybe the fetch method itself should do this?) - any insight would be much appreciated! thanks. -- Nathan From nscott@aconex.com Wed Mar 21 19:40:52 2007 Received: with ECARTIS (v1.0.0; list pcp); Wed, 21 Mar 2007 19:40:56 -0700 (PDT) X-Spam-oss-Status: No, score=0.0 required=5.0 tests=BAYES_50 autolearn=no version=3.2.0-pre1-r499012 Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l2M2eo6p025766 for ; Wed, 21 Mar 2007 19:40:51 -0700 Received: from edge (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 48B12AAC3A2; Thu, 22 Mar 2007 13:40:46 +1100 (EST) Subject: Re: Possibly quirky libpcp_pmda behavior? From: Nathan Scott Reply-To: nscott@aconex.com To: Max Matveev Cc: dchatterton@aconex.com, pcp@oss.sgi.com In-Reply-To: <17921.59938.22791.120850@kuku.melbourne.sgi.com> References: <1174523743.5051.439.camel@edge> <17921.59938.22791.120850@kuku.melbourne.sgi.com> Content-Type: text/plain Organization: Aconex Date: Thu, 22 Mar 2007 13:41:27 +1100 Message-Id: <1174531287.5051.504.camel@edge> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1153 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Content-Length: 633 Lines: 24 Hi (ghost of?) Max, On Thu, 2007-03-22 at 13:29 +1100, Max Matveev wrote: > >>>>> "nscott" == Nathan Scott writes: > > nscott> I have an agent which is returning PM_ERR_APPVERSION for some > nscott> of its metric values, for metrics not supported on a > nscott> particular platform version. > > What about returning PM_TYPE_NOSUPPORT instead of returning an error? Hmm... sounds good... /me looks No, not sure that will help - PM_TYPE_NOSUPPORT is -1, which will trip the less-than-zero guard on the fetchCallback call in libpcp_pmda, and that will end up in the same fprintf. No? cheers. -- Nathan From makc@melbourne.sgi.com Wed Mar 21 19:45:52 2007 Received: with ECARTIS (v1.0.0; list pcp); Wed, 21 Mar 2007 19:45:56 -0700 (PDT) X-Spam-oss-Status: No, score=0.0 required=5.0 tests=BAYES_50 autolearn=no version=3.2.0-pre1-r499012 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l2M2jn6p026247 for ; Wed, 21 Mar 2007 19:45:51 -0700 Received: from kuku.melbourne.sgi.com (kuku.melbourne.sgi.com [134.14.55.163]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA15833; Thu, 22 Mar 2007 13:29:56 +1100 Received: from kuku.melbourne.sgi.com (localhost [127.0.0.1]) by kuku.melbourne.sgi.com (SGI-8.12.11.20060308/8.12.11) with ESMTP id l2M2Ttlc371602; Thu, 22 Mar 2007 13:29:56 +1100 (EST) Received: (from makc@localhost) by kuku.melbourne.sgi.com (SGI-8.12.11.20060308/8.12.11/Submit) id l2M2TsjS371633; Thu, 22 Mar 2007 13:29:54 +1100 (EST) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17921.59938.22791.120850@kuku.melbourne.sgi.com> Date: Thu, 22 Mar 2007 13:29:54 +1100 From: Max Matveev To: nscott@aconex.com Cc: dchatterton@aconex.com, pcp@oss.sgi.com Subject: Re: Possibly quirky libpcp_pmda behavior? In-Reply-To: <1174523743.5051.439.camel@edge> References: <1174523743.5051.439.camel@edge> X-Mailer: VM 7.07 under 21.4 (patch 15) "Security Through Obscurity" XEmacs Lucid X-archive-position: 1154 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: makc@sgi.com Precedence: bulk X-list: pcp Content-Length: 307 Lines: 9 >>>>> "nscott" == Nathan Scott writes: nscott> I have an agent which is returning PM_ERR_APPVERSION for some nscott> of its metric values, for metrics not supported on a nscott> particular platform version. What about returning PM_TYPE_NOSUPPORT instead of returning an error? max From makc@melbourne.sgi.com Thu Mar 22 01:27:36 2007 Received: with ECARTIS (v1.0.0; list pcp); Thu, 22 Mar 2007 01:27:41 -0700 (PDT) X-Spam-oss-Status: No, score=0.0 required=5.0 tests=BAYES_50 autolearn=no version=3.2.0-pre1-r499012 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l2M8RX6p012386 for ; Thu, 22 Mar 2007 01:27:35 -0700 Received: from kuku.melbourne.sgi.com (kuku.melbourne.sgi.com [134.14.55.163]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id TAA27907; Thu, 22 Mar 2007 19:27:27 +1100 Received: from kuku.melbourne.sgi.com (localhost [127.0.0.1]) by kuku.melbourne.sgi.com (SGI-8.12.11.20060308/8.12.11) with ESMTP id l2M8RP8H372485; Thu, 22 Mar 2007 19:27:25 +1100 (EST) Received: (from makc@localhost) by kuku.melbourne.sgi.com (SGI-8.12.11.20060308/8.12.11/Submit) id l2M8ROF1372470; Thu, 22 Mar 2007 19:27:24 +1100 (EST) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17922.15852.67062.432237@kuku.melbourne.sgi.com> Date: Thu, 22 Mar 2007 19:27:24 +1100 From: Max Matveev To: nscott@aconex.com Cc: pcp@oss.sgi.com Subject: Re: Possibly quirky libpcp_pmda behavior? In-Reply-To: <1174531287.5051.504.camel@edge> References: <1174523743.5051.439.camel@edge> <17921.59938.22791.120850@kuku.melbourne.sgi.com> <1174531287.5051.504.camel@edge> X-Mailer: VM 7.07 under 21.4 (patch 15) "Security Through Obscurity" XEmacs Lucid X-archive-position: 1156 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: makc@sgi.com Precedence: bulk X-list: pcp Content-Length: 845 Lines: 22 >>>>> "nscott" == Nathan Scott writes: nscott> Hi (ghost of?) Max, Boo! nscott> No, not sure that will help - PM_TYPE_NOSUPPORT is -1, which nscott> will trip the less-than-zero guard on the fetchCallback call nscott> in libpcp_pmda, and that will end up in the same fprintf. nscott> No? No. The idea here is to cut all calls to pmdaFetch for the metric you don't support. First of all, if a metric is "intermittent" then you have an option to return empty pmResult from the fetch - this is a valid response. But if you're sure that metric is not going to be available at all then either return error from pmLookupDesc to stop clients from calling you or give them a descriptor with NO_SUPPORT type and that should stop them too. max PS. I wonder what would it take to make pcp@oss.sgi.com a subscriber-only list? From nscott@aconex.com Thu Mar 22 15:23:43 2007 Received: with ECARTIS (v1.0.0; list pcp); Thu, 22 Mar 2007 15:23:52 -0700 (PDT) X-Spam-oss-Status: No, score=0.0 required=5.0 tests=BAYES_50 autolearn=no version=3.2.0-pre1-r499012 Received: from postoffice.aconex.com (mail.app.aconex.com [203.89.192.138]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id l2MMNe6p018993 for ; Thu, 22 Mar 2007 15:23:43 -0700 Received: from edge (unknown [203.89.192.141]) by postoffice.aconex.com (Postfix) with ESMTP id 98BD2AAC4B6; Fri, 23 Mar 2007 09:23:38 +1100 (EST) Subject: Re: Possibly quirky libpcp_pmda behavior? From: Nathan Scott Reply-To: nscott@aconex.com To: Max Matveev Cc: pcp@oss.sgi.com In-Reply-To: <17922.15852.67062.432237@kuku.melbourne.sgi.com> References: <1174523743.5051.439.camel@edge> <17921.59938.22791.120850@kuku.melbourne.sgi.com> <1174531287.5051.504.camel@edge> <17922.15852.67062.432237@kuku.melbourne.sgi.com> Content-Type: text/plain Organization: Aconex Date: Fri, 23 Mar 2007 09:24:26 +1100 Message-Id: <1174602266.5051.543.camel@edge> Mime-Version: 1.0 X-Mailer: Evolution 2.6.3 Content-Transfer-Encoding: 7bit X-archive-position: 1164 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: nscott@aconex.com Precedence: bulk X-list: pcp Content-Length: 1700 Lines: 48 On Thu, 2007-03-22 at 19:27 +1100, Max Matveev wrote: > >>>>> "nscott" == Nathan Scott writes: > > nscott> Hi (ghost of?) Max, > Boo! Eek! > nscott> No, not sure that will help - PM_TYPE_NOSUPPORT is -1, which > nscott> will trip the less-than-zero guard on the fetchCallback call > nscott> in libpcp_pmda, and that will end up in the same fprintf. > nscott> No? > > No. The idea here is to cut all calls to pmdaFetch for the metric you > don't support. Oh, I see - no that wont fly eitherm its a bit more dynamic than that - the metric is supported for some instances, but not others. Theres one indom shared across all metrics in the agent, and some of the values are for some of the metrics are sometimes not supported (depending on the "things" being monitored, which are software "things", that can be upgraded/ stopped/started independently). Separate indoms for each metric isn't really an option either. > First of all, if a metric is "intermittent" then you > have an option to return empty pmResult from the fetch - this is a > valid response. But if you're sure that metric is not going to be > available at all then either return error from pmLookupDesc to stop > clients from calling you or give them a descriptor with NO_SUPPORT > type and that should stop them too. I see what you mean now, but in this case thats not going to fly for me. I think I'll run with a variant on Daves patch for now. > PS. I wonder what would it take to make pcp@oss.sgi.com a > subscriber-only list? That sounds like a good idea (Chatz suggested same yesterday)... guess we talk to Trev (or whoever the pcp list admin is - maybe its Mark still)? cheers. -- Nathan