From owner-pcp@oss.sgi.com Fri Aug 4 07:34:30 2000 Received: by oss.sgi.com id ; Fri, 4 Aug 2000 07:34:20 -0700 Received: from heffalump.fnal.gov ([131.225.9.20]:12338 "EHLO fnal.gov") by oss.sgi.com with ESMTP id ; Fri, 4 Aug 2000 07:33:57 -0700 Received: from thebrain.fnal.gov ([131.225.80.75]) by smtp.fnal.gov (PMDF V6.0-24 #44770) with ESMTP id <0FYR00J4EV3Q45@smtp.fnal.gov> for pcp@oss.sgi.com; Fri, 04 Aug 2000 09:33:26 -0500 (CDT) Received: from fnal.gov (localhost.localdomain [127.0.0.1]) by thebrain.fnal.gov (8.10.2/8.10.2) with ESMTP id e74EXP212111; Fri, 04 Aug 2000 09:33:25 -0500 Date: Fri, 04 Aug 2000 09:33:25 -0500 From: Troy Dawson Subject: pmlogger - num processes To: PCP Mailing List Message-id: <398AD435.72079E11@fnal.gov> MIME-version: 1.0 X-Mailer: Mozilla 4.73 [en] (X11; U; Linux 2.2.16-3smp i686) Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7bit X-Accept-Language: en Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Howdy, I just ran into a problem, that might already be fixed (I'm on pcp 2.1.4) but I thought I'd bring it up. It concerns pmlogger when you are monitoring lots of systems. Basically there is a seperate process that runs for each machine that you are logging. I'm sure that this makes the gathering of data and such much quicker, but it does have a drawback when the number of machines you are monitoring gets up high, like several hundred or thousand. Basically the problem is this. According to the error message I have, the VFS (Virtual File Server) running on Linux can only access a maximum of 4096 files at a time. After that the machine basically goes belly up. So if you have 250 loggers going, each of them normally open 5 files, you have 1250 files open. Now when you do the log rotate, I can't tell for sure, but I believe you have a minumum of 10 files open, and possibly 15, the number jumps to 2500 (for 10 files) plus the original 1250 equals 3750, which is getting awfully close to the limit. If it is 15, your already there. OK, so you can guess why I'm writting this, yesterday, I added 50 more machines to my logger, and at log rotation time, the machine choked. (Just to note, it didn't crash, you just couldn't actually do anything useful) Anyway, this is a problem that probrubly needs to be looked at. Troy -- __________________________________________________ Troy Dawson dawson@fnal.gov (630)840-6468 Fermilab ComputingDivision/OSS CSS Group __________________________________________________ From owner-pcp@oss.sgi.com Fri Aug 4 13:16:03 2000 Received: by oss.sgi.com id ; Fri, 4 Aug 2000 13:15:43 -0700 Received: from deliverator.sgi.com ([204.94.214.10]:41823 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Fri, 4 Aug 2000 13:15:20 -0700 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via SMTP id NAA10954 for ; Fri, 4 Aug 2000 13:07:15 -0700 (PDT) mail_from (markgw@sgi.com) Received: from sandpit.melbourne.sgi.com (sandpit.melbourne.sgi.com [134.14.55.132]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id GAA11428; Sat, 5 Aug 2000 06:13:28 +1000 Date: Sat, 5 Aug 2000 06:13:27 +1000 (EST) From: Mark Goodwin X-Sender: markgw@sandpit.melbourne.sgi.com To: Troy Dawson cc: PCP Mailing List Subject: Re: pmlogger - num processes In-Reply-To: <398AD435.72079E11@fnal.gov> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Troy, these are all pretty tough questions. The only short term solution I can think of is to rebuild the kernel on your logging machine to handle more file descriptors and/or split the logging load between several machines and cross mount the archives. You should also be able to run the loggers on an SGI IRIX system, if you happen to have one - IRIX generated PCP archives are binary compatible with those from PCP on Linux. Other than that, I'm sure you'll raise the issues with Ken when he gets there - ask him why we don't support multiple hosts per archive ... that'll get things moving .. ;-) thanks -- Mark On Fri, 4 Aug 2000, Troy Dawson wrote: > Howdy, > I just ran into a problem, that might already be fixed (I'm on pcp 2.1.4) but > I thought I'd bring it up. It concerns pmlogger when you are monitoring lots > of systems. > Basically there is a seperate process that runs for each machine that you are > logging. I'm sure that this makes the gathering of data and such much > quicker, but it does have a drawback when the number of machines you are > monitoring gets up high, like several hundred or thousand. > Basically the problem is this. According to the error message I have, the VFS > (Virtual File Server) running on Linux can only access a maximum of 4096 files > at a time. After that the machine basically goes belly up. So if you have > 250 loggers going, each of them normally open 5 files, you have 1250 files > open. Now when you do the log rotate, I can't tell for sure, but I believe > you have a minumum of 10 files open, and possibly 15, the number jumps to 2500 > (for 10 files) plus the original 1250 equals 3750, which is getting awfully > close to the limit. If it is 15, your already there. > OK, so you can guess why I'm writting this, yesterday, I added 50 more > machines to my logger, and at log rotation time, the machine choked. (Just to > note, it didn't crash, you just couldn't actually do anything useful) > Anyway, this is a problem that probrubly needs to be looked at. > Troy > From owner-pcp@oss.sgi.com Sun Aug 6 23:38:16 2000 Received: by oss.sgi.com id ; Sun, 6 Aug 2000 23:38:07 -0700 Received: from hermes.mixx.net ([212.84.196.2]:22291 "HELO hermes.mixx.net") by oss.sgi.com with SMTP id ; Sun, 6 Aug 2000 23:37:30 -0700 Received: from mate.bln.innominate.de (cerberus.innominate.de [212.84.234.251]) by hermes.mixx.net (Postfix) with ESMTP id 07A7CF83C for ; Mon, 7 Aug 2000 08:36:59 +0200 (CEST) Received: by mate.bln.innominate.de (Postfix, from userid 9) id 88B6C2CA6B; Mon, 7 Aug 2000 08:36:58 +0200 (CEST) From: Thomas Graichen Reply-To: Thomas Graichen X-Newsgroups: innominate.list.sgi.pcp Subject: pmlogger not in rpm Date: 7 Aug 2000 06:36:58 GMT Organization: innominate AG, Berlin, Germany Lines: 10 Distribution: local Message-ID: Reply-To: thomas.graichen@innominate.de X-Trace: mate.bln.innominate.de 965630218 19798 10.0.0.69 (7 Aug 2000 06:36:58 GMT) X-Complaints-To: news@innominate.de User-Agent: tin/1.4.2-20000205 ("Possession") (UNIX) (Linux/2.2.16-local (i586)) To: pcp@oss.sgi.com Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing is there any special reason why the pmlogger is not part of the pcp rpm (but in the source and even built) ? t -- thomas.graichen@innominate.de Technical Director innominate AG Clustering & Security networking people tel: +49.30.308806-13 fax: -77 http://innominate.de From owner-pcp@oss.sgi.com Sun Aug 6 23:49:57 2000 Received: by oss.sgi.com id ; Sun, 6 Aug 2000 23:49:47 -0700 Received: from hermes.mixx.net ([212.84.196.2]:4356 "HELO hermes.mixx.net") by oss.sgi.com with SMTP id ; Sun, 6 Aug 2000 23:48:58 -0700 Received: from mate.bln.innominate.de (cerberus.innominate.de [212.84.234.251]) by hermes.mixx.net (Postfix) with ESMTP id 7DD21F83D for ; Mon, 7 Aug 2000 08:48:26 +0200 (CEST) Received: by mate.bln.innominate.de (Postfix, from userid 9) id 6384B2CA6B; Mon, 7 Aug 2000 08:48:26 +0200 (CEST) From: Thomas Graichen Reply-To: Thomas Graichen X-Newsgroups: innominate.list.sgi.pcp Subject: Re: pmlogger not in rpm Date: 7 Aug 2000 06:48:26 GMT Organization: innominate AG, Berlin, Germany Lines: 16 Distribution: local Message-ID: References: Reply-To: thomas.graichen@innominate.de X-Trace: mate.bln.innominate.de 965630906 19798 10.0.0.69 (7 Aug 2000 06:48:26 GMT) X-Complaints-To: news@innominate.de User-Agent: tin/1.4.2-20000205 ("Possession") (UNIX) (Linux/2.2.16-local (i586)) To: pcp@oss.sgi.com Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Thomas Graichen wrote: > is there any special reason why the pmlogger is not part of the pcp > rpm (but in the source and even built) ? sorry - mistyped the rpm command to find it (rpm -qa | grep instead of rpm -ql pcp | grep) - sorry for the confusion ... so it is there :-) t -- thomas.graichen@innominate.de Technical Director innominate AG Clustering & Security networking people tel: +49.30.308806-13 fax: -77 http://innominate.de From owner-pcp@oss.sgi.com Mon Aug 7 09:40:01 2000 Received: by oss.sgi.com id ; Mon, 7 Aug 2000 09:39:42 -0700 Received: from deliverator.sgi.com ([204.94.214.10]:1811 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Mon, 7 Aug 2000 09:39:23 -0700 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via SMTP id XAA03208 for ; Sun, 6 Aug 2000 23:38:20 -0700 (PDT) mail_from (markgw@sgi.com) Received: from sandpit.melbourne.sgi.com (sandpit.melbourne.sgi.com [134.14.55.132]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA24470; Mon, 7 Aug 2000 16:43:18 +1000 Date: Mon, 7 Aug 2000 16:43:18 +1000 (EST) From: Mark Goodwin X-Sender: markgw@sandpit.melbourne.sgi.com To: Thomas Graichen , thomas.graichen@innominate.de cc: pcp@oss.sgi.com Subject: Re: pmlogger not in rpm In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing On 7 Aug 2000, Thomas Graichen wrote: > is there any special reason why the pmlogger is not part of the pcp > rpm (but in the source and even built) ? > Which version (specifically) of the PCP RPM is missing pmlogger? Did you look in /usr/share/pcp/bin? From owner-pcp@oss.sgi.com Mon Aug 7 10:00:31 2000 Received: by oss.sgi.com id ; Mon, 7 Aug 2000 10:00:21 -0700 Received: from pneumatic-tube.sgi.com ([204.94.214.22]:48476 "EHLO pneumatic-tube.sgi.com") by oss.sgi.com with ESMTP id ; Mon, 7 Aug 2000 10:00:01 -0700 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by pneumatic-tube.sgi.com (980327.SGI.8.8.8-aspam/980310.SGI-aspam) via SMTP id XAA05993 for ; Sun, 6 Aug 2000 23:49:27 -0700 (PDT) mail_from (max@kuku.melbourne.sgi.com) Received: from kuku.melbourne.sgi.com (kuku.melbourne.sgi.com [134.14.55.163]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id QAA24467; Mon, 7 Aug 2000 16:42:10 +1000 Received: (from max@localhost) by kuku.melbourne.sgi.com (SGI-8.9.3/8.9.3) id QAA26587; Mon, 7 Aug 2000 16:42:10 +1000 (EST) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14734.23106.502595.632550@kuku.melbourne.sgi.com> Date: Mon, 7 Aug 2000 16:42:10 +1000 (EST) From: Max Matveev To: Thomas Graichen Cc: pcp@oss.sgi.com Subject: Re: pmlogger not in rpm In-Reply-To: References: X-Mailer: VM 6.72 under 21.1 (patch 10) "Capitol Reef" XEmacs Lucid Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Thomas Graichen writes: :> is there any special reason why the pmlogger is not part of the pcp :> rpm (but in the source and even built) ? Hm, last time I've seen in in /usr/share/pcp/bin/pmlogger. max From owner-pcp@oss.sgi.com Fri Aug 11 04:01:52 2000 Received: by oss.sgi.com id ; Fri, 11 Aug 2000 04:01:47 -0700 Received: from deliverator.sgi.com ([204.94.214.10]:32611 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Fri, 11 Aug 2000 04:01:16 -0700 Received: from rattle.melbourne.sgi.com (rattle.melbourne.sgi.com [134.14.55.145]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id DAA27952 for ; Fri, 11 Aug 2000 03:25:04 -0700 (PDT) mail_from (kenmcd@melbourne.sgi.com) From: kenmcd@melbourne.sgi.com Received: from localhost (kenmcd@localhost) by rattle.melbourne.sgi.com (SGI-8.9.3/8.9.3) with ESMTP id UAA24330; Fri, 11 Aug 2000 20:31:15 +1000 (AEST) X-Authentication-Warning: rattle.melbourne.sgi.com: kenmcd owned process doing -bs Date: Fri, 11 Aug 2000 20:31:15 +1000 Reply-To: kenmcd@melbourne.sgi.com To: Nathan Scott cc: lemming@arthur.plbohnice.cz, pcp@oss.sgi.com Subject: Re: Archive interpolation mode question In-Reply-To: <10006091038.ZM12744@wobbly.melbourne.sgi.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing [my reply is very late, hope it is of use to others exploring the wonders of PCP archives] On Fri, 9 Jun 2000, Nathan Scott wrote: > > On Jun 9, 1:41am, Michal Kara wrote: > > Subject: Archive interpolation mode question > > Hello! > > > > I have added archive mode to PCPMON. It works fine, but when I use the > > "interpolation" mode, PCP refuses to fetch first two (or three?) metrics > > (i think you mean the first 2/3 values for metrics, not the first > 2/3 metrics?) > > > from the archive. I guess it is because the interpolation algorithm needs few > > previous values, I just want to be sure. > > > > pfft - which bit of the code in libpcp/src/interp.c didn't you > understand? (all of it? join the club!!) 8-) > > this is truly complex stuff ... [explanation deleted] Nathan nailed most of the points, just let me add a few more: 1. For metrics with counter semantics, the value at time t is computed by linear interpolation from the value closest to t and <= t, and the value closest to t and >= t. t | ----+--+----------|---+------+--- values in the archive at + ^ ^ ^ ^ | | | | t6 t7 t8 t9 so we'd use t7 and t8 in the example above. Now the tj are not necessarily equally spaced, and not all metric-value pairs appear in each tj. Herein in lies much of the complexity in finding t7 and t8 without excessive searching. Of course things like random seeking into the archive via pmSetMode() and reading forwards or backwards are not going to make it simpler either. 2. Monitoring tools tend to use constant time increments, so the following might happen for a single metric-value with counter semantics: + the first K monitor times are < the first observation in the archive So an interpolated value is not available until the (K+1)th monitor time. 3. Tools consuming counter metrics need 2 samples before they can compute a result. So in the example above, the rate is not available until the (K+2)th monitor time. Any value >=0 is possible for K ... so having nothing to report for two, three, ... samples is perfectly possible (especially if the sample time interval as about the same, or smaller than, the mean time between observations in the archive. 4. For non-counter metrics the story is similar, but different as follows: + no interpolation + instantaneous metrics muyst be bounded like counters before a value is returned + bounding is not required for discrete metrics > > manpage. > > P.S.: If it is as I think, it would be nice to leave a note in pmSetMode(3) > > this could easily be the subject of a lengthy white paper, i'm sure. We (er, I mean I), should add this to the PCP tutorial pages, and then add these to the open source release ... its now on my TODO list. From owner-pcp@oss.sgi.com Wed Aug 16 15:54:55 2000 Received: by oss.sgi.com id ; Wed, 16 Aug 2000 15:54:35 -0700 Received: from pneumatic-tube.sgi.com ([204.94.214.22]:18760 "EHLO pneumatic-tube.sgi.com") by oss.sgi.com with ESMTP id ; Wed, 16 Aug 2000 15:54:31 -0700 Received: from rattle.melbourne.sgi.com (rattle.melbourne.sgi.com [134.14.55.145]) by pneumatic-tube.sgi.com (980327.SGI.8.8.8-aspam/980310.SGI-aspam) via ESMTP id KAA01615 for ; Wed, 16 Aug 2000 10:37:34 -0700 (PDT) mail_from (kenmcd@melbourne.sgi.com) From: kenmcd@melbourne.sgi.com Received: from localhost (kenmcd@localhost) by rattle.melbourne.sgi.com (SGI-8.9.3/8.9.3) with ESMTP id DAA21809; Thu, 17 Aug 2000 03:28:48 +1000 (AEST) X-Authentication-Warning: rattle.melbourne.sgi.com: kenmcd owned process doing -bs Date: Thu, 17 Aug 2000 03:28:48 +1000 Reply-To: kenmcd@melbourne.sgi.com To: Troy Dawson cc: PCP Mailing List Subject: Re: pmlogger - num processes In-Reply-To: <398AD435.72079E11@fnal.gov> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing On Fri, 4 Aug 2000, Troy Dawson wrote: > Howdy, > I just ran into a problem, that might already be fixed (I'm on pcp 2.1.4) but > I thought I'd bring it up. It concerns pmlogger when you are monitoring lots > of systems. > Basically there is a seperate process that runs for each machine that you are > logging. I'm sure that this makes the gathering of data and such much > quicker, but it does have a drawback when the number of machines you are > monitoring gets up high, like several hundred or thousand. > Basically the problem is this. According to the error message I have, the VFS > (Virtual File Server) running on Linux can only access a maximum of 4096 files > at a time. After that the machine basically goes belly up. So if you have > 250 loggers going, each of them normally open 5 files, you have 1250 files > open. ... 1 . stderr 2 . control socket for pmlc 3 . log meta data 4 . log data 5 . log index So, 5 fd's per pmlogger is correct. We could possibly provide an option to not allow pmlc control and claw one back ... but you'll eventually need a kernel reconfig to increase the number of fds (I am assuming this is a tuneable that can be systune'd up, especially since 4096 seems like a tiny number, at least from our IRIX perspective). > ... Now when you do the log rotate, I can't tell for sure, but I believe > you have a minumum of 10 files open, and possibly 15, the number jumps to 2500 > (for 10 files) plus the original 1250 equals 3750, which is getting awfully > close to the limit. If it is 15, your already there. > OK, so you can guess why I'm writting this, yesterday, I added 50 more > machines to my logger, and at log rotation time, the machine choked. (Just to > note, it didn't crash, you just couldn't actually do anything useful) > Anyway, this is a problem that probrubly needs to be looked at. I don't think the analysis is correct here. The pmloggers are stopped and restarted one at a time, so the fd demand should not change. The one wild card is pmlogmerge that is concatenating all of today's logs for each host together ... again this is done one host at a time, but if pmlogger was restarted N times during the day for a single host, then pmlogmerges needs (N+1)*3 + 1 fd's. If this problem persists, perhaps you could snap an ls -R of the pcplog directories before the cron job so I can investigate some more. From owner-pcp@oss.sgi.com Wed Aug 16 23:06:07 2000 Received: by oss.sgi.com id ; Wed, 16 Aug 2000 23:05:57 -0700 Received: from tah14.ctt.cz ([194.108.115.182]:6163 "EHLO arthur.plbohnice.cz") by oss.sgi.com with ESMTP id ; Wed, 16 Aug 2000 23:05:46 -0700 Received: (from lemming@localhost) by arthur.plbohnice.cz (8.9.3/8.10.1) id IAA11306 for pcp@oss.sgi.com; Thu, 17 Aug 2000 08:05:46 +0200 Date: Thu, 17 Aug 2000 08:05:46 +0200 From: lemming@arthur.plbohnice.cz To: pcp@oss.sgi.com Subject: Re: pmlogger - num processes Message-ID: <20000817080546.A11291@arthur.plbohnice.cz> References: <398AD435.72079E11@fnal.gov> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: ; from kenmcd@melbourne.sgi.com on Thu, Aug 17, 2000 at 03:28:48AM +1000 Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing > need a kernel reconfig to increase the number of fds (I am assuming > this is a tuneable that can be systune'd up, especially since 4096 > seems like a tiny number, at least from our IRIX perspective). > Yes, in kernel 2.2.x and newer the maximum number of files is can be changed in run-time in the /proc/sys/fs/file-max "file". Michal Kara From owner-pcp@oss.sgi.com Sun Aug 20 14:05:15 2000 Received: by oss.sgi.com id ; Sun, 20 Aug 2000 14:05:06 -0700 Received: from hermes.mixx.net ([212.84.196.2]:7951 "HELO hermes.mixx.net") by oss.sgi.com with SMTP id ; Sun, 20 Aug 2000 14:04:46 -0700 Received: from mate.bln.innominate.de (cerberus.innominate.de [212.84.234.251]) by hermes.mixx.net (Postfix) with ESMTP id 1754AF816 for ; Sun, 20 Aug 2000 23:04:44 +0200 (CEST) Received: by mate.bln.innominate.de (Postfix, from userid 9) id A12862CA83; Sun, 20 Aug 2000 23:04:43 +0200 (CEST) From: Thomas Graichen Reply-To: Thomas Graichen X-Newsgroups: innominate.list.sgi.pcp Subject: pcpmon with archive Date: 20 Aug 2000 21:04:43 GMT Organization: innominate AG, Berlin, Germany Lines: 15 Distribution: local Message-ID: Reply-To: thomas.graichen@innominate.de X-Trace: mate.bln.innominate.de 966805483 7366 10.0.0.69 (20 Aug 2000 21:04:43 GMT) X-Complaints-To: news@innominate.de User-Agent: tin/1.4.2-20000205 ("Possession") (UNIX) (Linux/2.2.16-local (i586)) To: pcp@oss.sgi.com Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing i just found a replay to the announcement of a pcpmon 1.2.95 with archive mode added - but could not find the original post - so the question: where can i find it ? - and related: what over tools are available for interpreting pcp archives ? - and again related: are there any plans from sgi to make the sgi display tool open source ? a lot of thanks in advance t -- thomas.graichen@innominate.de technical director innominate AG clustering & security networking people tel: +49.30.308806-13 fax: -77 http://innominate.de From owner-pcp@oss.sgi.com Sun Aug 20 14:08:56 2000 Received: by oss.sgi.com id ; Sun, 20 Aug 2000 14:08:46 -0700 Received: from hermes.mixx.net ([212.84.196.2]:12303 "HELO hermes.mixx.net") by oss.sgi.com with SMTP id ; Sun, 20 Aug 2000 14:08:38 -0700 Received: from mate.bln.innominate.de (cerberus.innominate.de [212.84.234.251]) by hermes.mixx.net (Postfix) with ESMTP id 70B6FF816 for ; Sun, 20 Aug 2000 23:08:36 +0200 (CEST) Received: by mate.bln.innominate.de (Postfix, from userid 9) id 34A032CA83; Sun, 20 Aug 2000 23:08:36 +0200 (CEST) From: Thomas Graichen Reply-To: Thomas Graichen X-Newsgroups: innominate.list.sgi.pcp Subject: Re: pcpmon with archive Date: 20 Aug 2000 21:08:36 GMT Organization: innominate AG, Berlin, Germany Lines: 18 Distribution: local Message-ID: References: Reply-To: thomas.graichen@innominate.de X-Trace: mate.bln.innominate.de 966805716 7366 10.0.0.69 (20 Aug 2000 21:08:36 GMT) X-Complaints-To: news@innominate.de User-Agent: tin/1.4.2-20000205 ("Possession") (UNIX) (Linux/2.2.16-local (i586)) To: pcp@oss.sgi.com Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing Thomas Graichen wrote: > i just found a replay to the announcement of a pcpmon 1.2.95 with > archive mode added - but could not find the original post - so the > question: where can i find it ? - and related: what over tools are > available for interpreting pcp archives ? - and again related: are > there any plans from sgi to make the sgi display tool open source ? ok - first question is answered - others stay open :-) http://k332.feld.cvut.cz/~lemming/projects/pcpmon-1.2.95.tar.gz t -- thomas.graichen@innominate.de technical director innominate AG clustering & security networking people tel: +49.30.308806-13 fax: -77 http://innominate.de From owner-pcp@oss.sgi.com Sun Aug 20 15:00:46 2000 Received: by oss.sgi.com id ; Sun, 20 Aug 2000 15:00:36 -0700 Received: from deliverator.sgi.com ([204.94.214.10]:34426 "EHLO convert rfc822-to-8bit deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Sun, 20 Aug 2000 15:00:21 -0700 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via SMTP id OAA04299 for ; Sun, 20 Aug 2000 14:52:43 -0700 (PDT) mail_from (markgw@sgi.com) Received: from sandpit.melbourne.sgi.com (sandpit.melbourne.sgi.com [134.14.55.132]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id HAA10420; Mon, 21 Aug 2000 07:58:53 +1000 Date: Mon, 21 Aug 2000 07:58:53 +1000 (EST) From: Mark Goodwin X-Sender: markgw@sandpit.melbourne.sgi.com To: Thomas Graichen , thomas.graichen@innominate.de cc: pcp@oss.sgi.com Subject: Re: pcpmon with archive In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN Content-Transfer-Encoding: 8BIT Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing On 20 Aug 2000, Thomas Graichen wrote: > Thomas Graichen wrote: > > i just found a replay to the announcement of a pcpmon 1.2.95 with > > archive mode added - but could not find the original post - so the > > question: where can i find it ? - and related: what over tools are > > available for interpreting pcp archives ? - and again related: are > > there any plans from sgi to make the sgi display tool open source ? > > ok - first question is answered - others stay open :-) > > http://k332.feld.cvut.cz/~lemming/projects/pcpmon-1.2.95.tar.gz > Sorry, there are no plans to open source any of the PCP graphical tools, but if you want to develop a tool (like pcpmon for example), we're more than willing to help you get started and debug your application. The open source release of PCP includes the following tools for replaying and/or manipulation of archives. See the relevant man pages for detail. Monitoring tools that can replay archives: pmstat - high-level system performance overview pmstat [-A align] [-a archive] [-d] [-h host] [-H file] [-L] [-n pmnsfile] [-O offset] [-p] [-S starttime] [-s samples] [-T endtime] [-t interval] [-Z timezone] [-z] pmclient - a simple performance metrics client pmclient [-a archive] [-h host] [-n pmnsfile] [-p] [-S numsec] [-s samples] [-t interval] [-Z timezone] [-z] pmie - inference engine for performance metrics pmie [-bCdfVvWxz] [-A align] [-a archive] [-c filename] [-h host] [-l logfile] [-n pmnsfile] [-O offset] [-S starttime] [-T endtime] [-t interval] [-Z timezone] [file­ name ...] pminfo - display information about performance metrics pminfo [-dfFMmtTvz] [-a archive] [-b batchsize] [-h host­ name] [-n pmnsfile] [-O time] [-Z timezone] [metricname ...] pmprobe - lightweight probe for performance metrics pmprobe [-IiVv] [-a archive] [-h hostname] [-n pmnsfile] [metricname ...] pmlogsummary - calculate averages of metrics stored in a PCP archive pmlogsummary [-abfFiIlmMNxyz] [-B nbins] [-n pmnsfile] [-p precision] [-S starttime] [-T endtime] [-Z timezone] archive [metricname ...] PCP archive creation : pmlogger - create archive log for performance metrics /usr/share/pcp/bin/pmlogger [-c configfile] [-h host] [-l logfile] [-L] [-n pmnsfile] [-P] [-r] [-s endsize] [-t interval] [-T endtime] [-v volsize] [-V version] [-x fd] archive PCP Library functions for archive creation: See PMRECORDSETUP(3) man page. Archive management: pmdumplog - dump internal details of a performance metrics archive log pmdumplog [-adiLlmrstz] [-n pmnsfile] [-S starttime] [-T endtime] [-Z timezone] archive [metricname ...] pmlogextract - reduce, extract, concatenate and merge PCP archives /usr/share/pcp/bin//pmlogextract [-wz] [-c configfile] [-n pmnsfile] [-S starttime] [-s samples] [-T endtime] [-v volsamples] [-Z timezone] input [...] output pmlogcheck - checks for invalid data in a PCP archive pmlogcheck [-lz] [-n pmnsfile] [-S start] [-T finish] [-Z timezone] archive pmlc - configure active Performance Co-Pilot pmlogger(s) interactively pmlc [-e] [-h host] [-i] [-n pmnsfile] [-P] [-p port] [-Z timezone] [-z] [pid] mkaf - create a Performance Co-Pilot archive folio /usr/share/pcp/bin/mkaf [findopts] filename ... pmafm - Performance Co-Pilot archive folio manager pmafm folioname [command [arg ...]] Cron based archive management: See pmlogger_daily(1) man page Thanks -- Mark Goodwin SGI Engineering From owner-pcp@oss.sgi.com Fri Aug 25 15:54:32 2000 Received: by oss.sgi.com id ; Fri, 25 Aug 2000 15:54:12 -0700 Received: from pneumatic-tube.sgi.com ([204.94.214.22]:5988 "EHLO pneumatic-tube.sgi.com") by oss.sgi.com with ESMTP id ; Fri, 25 Aug 2000 15:53:54 -0700 Received: from rattle.melbourne.sgi.com (rattle.melbourne.sgi.com [134.14.55.145]) by pneumatic-tube.sgi.com (980327.SGI.8.8.8-aspam/980310.SGI-aspam) via ESMTP id PAA09602 for ; Fri, 25 Aug 2000 15:59:42 -0700 (PDT) mail_from (kenmcd@melbourne.sgi.com) From: kenmcd@melbourne.sgi.com Received: from localhost (kenmcd@localhost) by rattle.melbourne.sgi.com (SGI-8.9.3/8.9.3) with ESMTP id IAA62805; Sat, 26 Aug 2000 08:52:01 +1000 (AEST) X-Authentication-Warning: rattle.melbourne.sgi.com: kenmcd owned process doing -bs Date: Sat, 26 Aug 2000 08:52:01 +1000 Reply-To: kenmcd@melbourne.sgi.com To: "Craig I. Hagan" cc: pcp@oss.sgi.com Subject: Re: bug report In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing [I've included the wider audience in the reply as there are hints herein on ways to debug this sort of problem in general] On Tue, 22 Aug 2000, Craig I. Hagan wrote: > > i never heard anything. any ideas? OK, let me try and fix that! > On Tue, 22 Aug 2000 kenmcd@melbourne.sgi.com wrote: > > > On Thu, 27 Jul 2000, Craig I. Hagan wrote: > > > > > > mpvis doesn't appear to like the hp 6000 (6 cpu) machines > > > running linux. I keep getting the following, followed > > > by a blank screen: First, which mpvis? And what sort of system is madcow? Specifically I am interested in Irix or Linux. > > > [hagan@madcow hagan]$ mpvis -h test-103 > > > cpu3 cpu3 -2 cpu3 Hmm ... do you have any idea where the "cpu3 cpu3 -2 cpu3" text came from? mpvis does not normally generate any textual output ... but this may simply be a symtom of the real problem. One problem maybe the syntax for the CPU names in the instance domain on the hp linux system ... the presence of the word "minute" where I'd be expecting "cpu0" or "cpu1" or ... etc is strange. > > > pmview: Error: test-103:kernel.percpu.cpu.idle[minute]: Unknown or > > > illegal instance identifier > > > pmview: Error: test-103:kernel.percpu.cpu.nice[minute]: Unknown or > > > illegal instance identifier > > > pmview: Error: test-103:kernel.percpu.cpu.sys[minute]: Unknown or > > > illegal instance identifier > > > pmview: Error: test-103:kernel.percpu.cpu.user[minute]: Unknown or > > > illegal instance identifier > > > pmview: Error: Bar object has no metrics > > > pmview: Y-Scale Bar Object (instances in columns) at 0,0 (1x1) collides with > > > Y-Scale Bar Object (instances in columns) at 0,0 (1x1) > > > The later object will be ignored > > > > > > test-103 is running the 2.2.15 kernel (smp) on top of redhat 6.2. In addition to the questions I asked above, it would help to see the output of: pminfo -f -h test-103 kernel.percpu.cpu.user Also the output from: $ sh `which mpvis` -h test-103 And finally, to fix this will likely require a PCP archive from test-103. Please run this command: echo "log mandatory on 2 sec kernel.percpu.cpu" \ | pmlogger -T 10 -h test-103 ken (it will take 10 seconds to finish) then mail me the THREE files ken.0, ken.meta and ken.index. From owner-pcp@oss.sgi.com Fri Aug 25 16:57:32 2000 Received: by oss.sgi.com id ; Fri, 25 Aug 2000 16:57:23 -0700 Received: from [207.113.109.32] ([207.113.109.32]:10760 "HELO cih.com") by oss.sgi.com with SMTP id ; Fri, 25 Aug 2000 16:56:45 -0700 Received: by cih.com (Postfix, from userid 21616) id 1910334B4B; Fri, 25 Aug 2000 19:56:02 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by cih.com (Postfix) with ESMTP id 0261C33357; Fri, 25 Aug 2000 19:56:01 -0400 (EDT) Date: Fri, 25 Aug 2000 19:56:01 -0400 (EDT) From: "Craig I. Hagan" To: kenmcd@melbourne.sgi.com Cc: pcp@oss.sgi.com Subject: Re: bug report In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing > First, which mpvis? And what sort of system is madcow? Specifically > I am interested in Irix or Linux. > > > > > [hagan@madcow hagan]$ mpvis -h test-103 > > > > cpu3 cpu3 -2 cpu3 madcow is a uniproc linux machine, everything works there. test-103 is a 6 proc linux machine. > > Hmm ... do you have any idea where the "cpu3 cpu3 -2 cpu3" text came from? > mpvis does not normally generate any textual output ... but this may > simply be a symtom of the real problem. > > One problem maybe the syntax for the CPU names in the instance domain > on the hp linux system ... the presence of the word "minute" where I'd > be expecting "cpu0" or "cpu1" or ... etc is strange. > > > > > pmview: Error: test-103:kernel.percpu.cpu.idle[minute]: Unknown or > > > > illegal instance identifier > > > > pmview: Error: test-103:kernel.percpu.cpu.nice[minute]: Unknown or > > > > illegal instance identifier > > > > pmview: Error: test-103:kernel.percpu.cpu.sys[minute]: Unknown or > > > > illegal instance identifier > > > > pmview: Error: test-103:kernel.percpu.cpu.user[minute]: Unknown or > > > > illegal instance identifier > > > > pmview: Error: Bar object has no metrics > > > > pmview: Y-Scale Bar Object (instances in columns) at 0,0 (1x1) collides with > > > > Y-Scale Bar Object (instances in columns) at 0,0 (1x1) > > > > The later object will be ignored > > > > > > > > test-103 is running the 2.2.15 kernel (smp) on top of redhat 6.2. > > In addition to the questions I asked above, it would help to see > the output of: > > pminfo -f -h test-103 kernel.percpu.cpu.user > > Also the output from: > > $ sh `which mpvis` -h test-103 > > And finally, to fix this will likely require a PCP archive from > test-103. Please run this command: > > echo "log mandatory on 2 sec kernel.percpu.cpu" \ > | pmlogger -T 10 -h test-103 ken > > (it will take 10 seconds to finish) then mail me the THREE files ken.0, > ken.meta and ken.index. > > > From owner-pcp@oss.sgi.com Fri Aug 25 17:09:52 2000 Received: by oss.sgi.com id ; Fri, 25 Aug 2000 17:09:42 -0700 Received: from deliverator.sgi.com ([204.94.214.10]:43618 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Fri, 25 Aug 2000 17:09:06 -0700 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via SMTP id RAA26815 for ; Fri, 25 Aug 2000 17:00:58 -0700 (PDT) mail_from (nathans@wobbly.melbourne.sgi.com) Received: from wobbly.melbourne.sgi.com (wobbly.melbourne.sgi.com [134.14.55.135]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA06242; Sat, 26 Aug 2000 10:07:18 +1000 Received: (from nathans@localhost) by wobbly.melbourne.sgi.com (980427.SGI.8.8.8/980728.SGI.AUTOCF) id KAA11731; Sat, 26 Aug 2000 10:07:17 +1000 (EST) From: "Nathan Scott" Message-Id: <10008261007.ZM11755@wobbly.melbourne.sgi.com> Date: Sat, 26 Aug 2000 10:07:15 -0500 In-Reply-To: "Craig I. Hagan" "Re: bug report" (Aug 26, 9:59am) References: X-Mailer: Z-Mail (3.2.3 08feb96 MediaMail) To: "Craig I. Hagan" , kenmcd@melbourne.sgi.com Subject: Re: bug report Cc: pcp@oss.sgi.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-pcp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;pcp-outgoing hi, On Aug 26, 9:59am, Craig I. Hagan wrote: > Subject: Re: bug report > > madcow is a uniproc linux machine, everything works there. > test-103 is a 6 proc linux machine. > > > > In addition to the questions I asked above, it would help to see > > the output of: > > > > pminfo -f -h test-103 kernel.percpu.cpu.user > > > > Also the output from: > > > > $ sh `which mpvis` -h test-103 > > I think Ken meant: $ sh -x `which mpvis` -h test-103 cos without the -x there'll be no output to send. I'd go one step further and suggest you send the output from: $ sh -x `which mpvis` -V -h test-103 which will dump out the generate pmview config file as well. > > And finally, to fix this will likely require a PCP archive from > > test-103. Please run this command: > > > > echo "log mandatory on 2 sec kernel.percpu.cpu" \ > > | pmlogger -T 10 -h test-103 ken > > > > (it will take 10 seconds to finish) then mail me the THREE files ken.0, > > ken.meta and ken.index. > > > > > > >-- End of excerpt from Craig I. Hagan cheers. -- Nathan