From bclem@rice.edu Sat Mar 27 12:18:54 2004 Received: with ECARTIS (v1.0.0; list pcp); Sat, 27 Mar 2004 12:18:56 -0800 (PST) Received: from is.rice.edu (is.rice.edu [128.42.42.24]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id i2RKIiKO030659 for ; Sat, 27 Mar 2004 12:18:44 -0800 Received: from localhost (localhost [127.0.0.1]) by localhost.is.rice.edu (Postfix) with ESMTP id 9EE0E419C4 for ; Sat, 27 Mar 2004 13:43:24 -0600 (CST) Received: from is.rice.edu ([127.0.0.1]) by localhost (it.is.rice.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 29558-07 for ; Sat, 27 Mar 2004 13:43:22 -0600 (CST) Received: by is.rice.edu (Postfix, from userid 12077) id 6F212419BD; Sat, 27 Mar 2004 13:43:22 -0600 (CST) Received: from localhost (localhost [127.0.0.1]) by is.rice.edu (Postfix) with ESMTP id 619494DC94 for ; Sat, 27 Mar 2004 13:43:22 -0600 (CST) Date: Sat, 27 Mar 2004 13:43:21 -0600 (CST) From: "Brent M. Clements" To: pcp@oss.sgi.com Subject: Gathering metrics from other hosts. Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by amavis-20030314-p2 at is.rice.edu X-archive-position: 359 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: bclem@rice.edu Precedence: bulk X-list: pcp According to the pcp project page "A client-server architecture allows multiple clients to monitor the same host, and a single client to monitor multiple hosts (e.g. in a Beowulf cluster). This enables centralized monitoring of distributed processing." But, there is nowhere in the documentation that explains how to do this. If I wanted to do the following: Centralized Monitoring Server(master node in cluster) would gather metrics from multiple hosts(compute nodes in a cluster). How would one actually do this? Thanks, Brent Clements From bclem@rice.edu Sat Mar 27 13:16:22 2004 Received: with ECARTIS (v1.0.0; list pcp); Sat, 27 Mar 2004 13:16:23 -0800 (PST) Received: from is.rice.edu (is.rice.edu [128.42.42.24]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id i2RLGLKO002683 for ; Sat, 27 Mar 2004 13:16:22 -0800 Received: from localhost (localhost [127.0.0.1]) by localhost.is.rice.edu (Postfix) with ESMTP id 533E5419D0; Sat, 27 Mar 2004 15:16:21 -0600 (CST) Received: from is.rice.edu ([127.0.0.1]) by localhost (it.is.rice.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 07284-04; Sat, 27 Mar 2004 15:16:18 -0600 (CST) Received: by is.rice.edu (Postfix, from userid 12077) id 5E950419D2; Sat, 27 Mar 2004 15:16:18 -0600 (CST) Received: from localhost (localhost [127.0.0.1]) by is.rice.edu (Postfix) with ESMTP id 507B94DC94; Sat, 27 Mar 2004 15:16:18 -0600 (CST) Date: Sat, 27 Mar 2004 15:16:17 -0600 (CST) From: "Brent M. Clements" To: Jan-Frode Myklebust Cc: pcp@oss.sgi.com Subject: Re: Gathering metrics from other hosts. In-Reply-To: <20040327205404.GA11064@ii.uib.no> Message-ID: References: <20040327205404.GA11064@ii.uib.no> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by amavis-20030314-p2 at is.rice.edu X-archive-position: 360 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: bclem@rice.edu Precedence: bulk X-list: pcp Thanks alot..this is EXACTLY what i needed! I know I need to run the loggers on the master node, but should I run the collector daemons and agents on each of the compute nodes? Ie, should I install the pcp rpm on each of my compute nodes too and run the pcp service? -B On Sat, 27 Mar 2004, Jan-Frode Myklebust wrote: > On Sat, Mar 27, 2004 at 01:43:21PM -0600, Brent M. Clements wrote: > > > > Centralized Monitoring Server(master node in cluster) would gather metrics > > from multiple hosts(compute nodes in a cluster). How would one actually do > > this? > > I do this in our linux cluster (and wish more cluster-admins would see the > value in pcp). Go to the /var/pcp/config/pmlogger/ on your frontend > node, and copy config.sar to config.cluster. Edit the config.cluster > file, comment out whatever you don't want logged. > > Then set up the control-file in the same directory to use this config > for each of your nodes. I use: > > node1 n n /export/home/pmlogger/node1 -c ./config.cluster > node2 n n /export/home/pmlogger/node2 -c ./config.cluster > node3 n n /export/home/pmlogger/node3 -c ./config.cluster > node4 n n /export/home/pmlogger/node4 -c ./config.cluster > node5 n n /export/home/pmlogger/node5 -c ./config.cluster > etc.. > > Then restart pcp on this frontend node, and pmlogger will begin logging. > > Another thing you might want to have a look at is pmie which can be > used for f.ex. alerting you when file systems are running full. I have > this in /var/pcp/config/pmie/config.clusternodes > > delta = 4 mins; > filesys.filling = > some_inst ( > ( 100 * filesys.used / > filesys.capacity ) > 80 > && filesys.used + > 20 min * ( rate filesys.used ) > > filesys.capacity > ) -> syslog 10 min "File system is filling up" " %v%used[%i]@%h"; > > and a similar control file /var/pcp/config/pmie/control: > > node1 n PCP_LOG_DIR/pmie/node1/pmie.log -c config.clusternodes > node2 n PCP_LOG_DIR/pmie/node2/pmie.log -c config.clusternodes > node3 n PCP_LOG_DIR/pmie/node3/pmie.log -c config.clusternodes > node4 n PCP_LOG_DIR/pmie/node4/pmie.log -c config.clusternodes > node5 n PCP_LOG_DIR/pmie/node5/pmie.log -c config.clusternodes > node6 n PCP_LOG_DIR/pmie/node6/pmie.log -c config.clusternodes > etc.. > > Then there will be logged to the syslog on the monitor host if file > system usage is growing too fast. > > BTW: you might also want to add these two entries to root's crontab > for checking that the loggers are alive, and processing of old logs: > > # daily processing of archive logs > 10 0 * * * /usr/share/pcp/bin/pmlogger_daily -k forever -x 5 -X gzip > # every 30 minutes, check pmlogger instances are running > 25,55 * * * * /usr/share/pcp/bin/pmlogger_check > # every 30 minutes, check pmie instances are running > 24,54 * * * * /usr/share/pcp/bin/pmie_check > > > -jf > From chatz@melbourne.sgi.com Sat Mar 27 15:31:56 2004 Received: with ECARTIS (v1.0.0; list pcp); Sat, 27 Mar 2004 15:31:59 -0800 (PST) Received: from omx1.americas.sgi.com (cfcafw.sgi.com [198.149.23.1]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id i2RNVtKO007110 for ; Sat, 27 Mar 2004 15:31:55 -0800 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by omx1.americas.sgi.com (8.12.10/8.12.9/linux-outbound_gateway-1.1) with SMTP id i2RNMIf0022415 for ; Sat, 27 Mar 2004 17:22:19 -0600 Received: from melbourne.sgi.com (shiva211.melbourne.sgi.com [134.14.52.211]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id JAA05321; Sun, 28 Mar 2004 09:22:13 +1000 Message-ID: <40660C14.8060004@melbourne.sgi.com> Date: Sun, 28 Mar 2004 09:19:48 +1000 From: David Chatterton Reply-To: chatz@melbourne.sgi.com Organization: SGI User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20031013 Thunderbird/0.3 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "Brent M. Clements" CC: Jan-Frode Myklebust , pcp@oss.sgi.com Subject: Re: Gathering metrics from other hosts. References: <20040327205404.GA11064@ii.uib.no> In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 361 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: chatz@melbourne.sgi.com Precedence: bulk X-list: pcp Brent, Brent M. Clements wrote: > Thanks alot..this is EXACTLY what i needed! > > I know I need to run the loggers on the master node, but should I run the > collector daemons and agents on each of the compute nodes? > Yes, you need the pmcd daemon and agents running on each node. All the monitoring tools (pmlogger etc) can then collect metrics from any of those nodes, so you can collect all the data in one place if you like, rather than collecting logs on each node. David -- David Chatterton Phone: +61 3 9834 8234 CXFS MultiOS Eng Manager Mobile: +61 409 154 121 SGI Australia VNET: 524-8234 External: http://www.sgi.com/products/storage From kjw@pocket.rightsock.com Sat Mar 27 18:14:38 2004 Received: with ECARTIS (v1.0.0; list pcp); Sat, 27 Mar 2004 18:14:50 -0800 (PST) Received: from pocket.rightsock.com (c-24-6-193-71.client.comcast.net [24.6.193.71]) by oss.sgi.com (8.12.10/8.12.9) with SMTP id i2S2EbKO015845 for ; Sat, 27 Mar 2004 18:14:38 -0800 Received: from pocket.rightsock.com (pocket.rightsock.com [127.0.0.1]) by pocket.rightsock.com (8.12.8/8.12.8) with ESMTP id i2S2EKSl023996; Sat, 27 Mar 2004 18:14:20 -0800 Received: (from kjw@localhost) by pocket.rightsock.com (8.12.8/8.12.8/Submit) id i2S2E5X5023994; Sat, 27 Mar 2004 18:14:05 -0800 Date: Sat, 27 Mar 2004 18:14:05 -0800 From: Kevin Wang To: "Brent M. Clements" , David Chatterton Cc: pcp@oss.sgi.com, Jan-Frode Myklebust Subject: Re: Gathering metrics from other hosts. Message-ID: <20040328021405.GA23829@rightsock.com> References: <20040327205404.GA11064@ii.uib.no> <40660C14.8060004@melbourne.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <40660C14.8060004@melbourne.sgi.com> User-Agent: Mutt/1.4.1i X-archive-position: 362 X-ecartis-version: Ecartis v1.0.0 Sender: pcp-bounce@oss.sgi.com Errors-to: pcp-bounce@oss.sgi.com X-original-sender: kjw@rightsock.com Precedence: bulk X-list: pcp From Brent M. Clements > According to the pcp project page > > "A client-server architecture allows multiple clients to monitor the same > host, and a single client to monitor multiple hosts (e.g. in a Beowulf > cluster). This enables centralized monitoring of distributed processing." > > But, there is nowhere in the documentation that explains how to do this. > > If I wanted to do the following: > > Centralized Monitoring Server(master node in cluster) would gather metrics > from multiple hosts(compute nodes in a cluster). How would one actually do > this? As David Chatterton elaborated, pcp is made up of servers and client software. pmcd is the daemon that listens on the network for requests for data, pmlogger is *one* of the clients that talks to the networked pmcd daemons. There are lots of other tools available, but all of the tools have the ability to talk across the network to any server. The tools are extremely generalized and they all can read/write from the standard pmcd network daemons and archive files. Note that any network requests are lossy, so if you're collecting performance data and can't afford to lose any data points, you still need to log locally to disk. - Kevin Wang, kjw@rightsock.com