From owner-pcp@oss.sgi.com Thu Feb 14 15:57:47 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1ENvlY02775 for pcp-outgoing; Thu, 14 Feb 2002 15:57:47 -0800 Received: from web12907.mail.yahoo.com (web12907.mail.yahoo.com [216.136.174.74]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1ENvh902772 for ; Thu, 14 Feb 2002 15:57:43 -0800 Message-ID: <20020214225742.68050.qmail@web12907.mail.yahoo.com> Received: from [208.244.233.2] by web12907.mail.yahoo.com via HTTP; Thu, 14 Feb 2002 14:57:42 PST Date: Thu, 14 Feb 2002 14:57:42 -0800 (PST) From: Uncle Than Subject: Meminfo confusion To: pcp@oss.sgi.com MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 791 Lines: 26 I had been under the assumption that mem.physmem = mem.util.free + mem.util.shared + mem.util.cached + mem.util.bufmem. On a RH7.2 box (kernel 2.4.7-10) this is not the case. This is due to mem.util.cached being comprised of cache in ram and cache in swap. As denoted by the new entry (was not in RH7.1 [kernel 2.4.2-2]) in /proc/meminfo, "SwapCached". This would be a pretty useful metric to grab, as would most of the other one-liners in /proc/meminfo. So should I: 1) come up with a patch for the the linux pmda? 2) let the maintainers do it? 3) or roll up a new one? Given the state of my C-code lately .. I would vote for #2 *8) Than __________________________________________________ Do You Yahoo!? Send FREE Valentine eCards with Yahoo! Greetings! http://greetings.yahoo.com From owner-pcp@oss.sgi.com Thu Feb 14 16:27:35 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1F0RZM03104 for pcp-outgoing; Thu, 14 Feb 2002 16:27:35 -0800 Received: from deliverator.sgi.com (deliverator.sgi.com [204.94.214.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1F0RU903098 for ; Thu, 14 Feb 2002 16:27:30 -0800 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via SMTP id PAA14437 for ; Thu, 14 Feb 2002 15:23:05 -0800 (PST) mail_from (markgw@sgi.com) Received: from sherman.melbourne.sgi.com (sherman.melbourne.sgi.com [134.14.55.232]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id KAA28192; Fri, 15 Feb 2002 10:26:09 +1100 Date: Fri, 15 Feb 2002 10:26:09 +1100 (EST) From: Mark Goodwin X-X-Sender: To: Uncle Than cc: Subject: Re: Meminfo confusion In-Reply-To: <20020214225742.68050.qmail@web12907.mail.yahoo.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 1099 Lines: 33 On Thu, 14 Feb 2002, Uncle Than wrote: > I had been under the assumption that > > mem.physmem = mem.util.free + mem.util.shared + mem.util.cached + > mem.util.bufmem. Also notice physmem as reported in /proc/meminfo does not correspond to real physical mem; it's almost the same, but does not account for a small amount of mem reserved by the kernel. A way to figure out the exact amount still eludes me .. anyone know? > > On a RH7.2 box (kernel 2.4.7-10) this is not the case. > This is due to mem.util.cached being comprised of cache in ram and > cache in swap. As denoted by the new entry (was not in RH7.1 [kernel > 2.4.2-2]) in /proc/meminfo, "SwapCached". > > This would be a pretty useful metric to grab, as would most of the > other one-liners in /proc/meminfo. So should I: > > 1) come up with a patch for the the linux pmda? > 2) let the maintainers do it? > 3) or roll up a new one? > > Given the state of my C-code lately .. I would vote for #2 *8) well I would vote for #1 of course! Note that you'll need to be backward compatible for kernels < 2.4.7-10). thanks -- Mark From owner-pcp@oss.sgi.com Fri Feb 15 00:08:22 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1F88Mu08843 for pcp-outgoing; Fri, 15 Feb 2002 00:08:22 -0800 Received: from rigel.cis.ksu.edu (root@rigel.cis.ksu.edu [129.130.10.65]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1F88J908839 for ; Fri, 15 Feb 2002 00:08:19 -0800 Received: from acrux.cis.ksu.edu (sada@acrux.cis.ksu.edu [129.130.10.32]) by rigel.cis.ksu.edu (8.9.1/8.9.1/000517) with ESMTP id BAA05704 for ; Fri, 15 Feb 2002 01:08:17 -0600 (CST) Received: from localhost (sada@localhost) by acrux.cis.ksu.edu (8.9.1/8.9.1/000517) with ESMTP id BAA16649 for ; Fri, 15 Feb 2002 01:08:17 -0600 (CST) X-Authentication-Warning: acrux.cis.ksu.edu: sada owned process doing -bs Date: Fri, 15 Feb 2002 01:08:17 -0600 (CST) From: Sadanand Kota To: pcp@oss.sgi.com Subject: 'No PMCD agent for domain of request' ? Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 649 Lines: 20 Hi, I have written a PMDA to find the network characteristics of a process. I have assigned it a domain number number 254 ( in domain.h). I dont have any other PMDA with same number. Byt when I install the PMDA and check it with 'pminfo' command, it gives the error message as ' No PMCD agent for domain of request'. I have checked the pmcd.conf file to see for any conflicts and found nothing. The message at the end of pmcd.log is 'Cleanup "xxx" agent (dom 254): protocol failure for fd=14, exit(255)' where 'xxx' is name of my PMDA. Also, the installation scripts of PMDA does not give any error messages. Any idea to correct it? Sadanand From owner-pcp@oss.sgi.com Fri Feb 15 00:13:17 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1F8DHL08906 for pcp-outgoing; Fri, 15 Feb 2002 00:13:17 -0800 Received: from pneumatic-tube.sgi.com (pneumatic-tube.sgi.com [204.94.214.22]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1F8DF908903 for ; Fri, 15 Feb 2002 00:13:15 -0800 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by pneumatic-tube.sgi.com (980327.SGI.8.8.8-aspam/980310.SGI-aspam) via SMTP id XAA09449 for ; Thu, 14 Feb 2002 23:14:31 -0800 (PST) mail_from (makc@kuku.melbourne.sgi.com) Received: from kuku.melbourne.sgi.com (kuku.melbourne.sgi.com [134.14.55.163]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id SAA01186; Fri, 15 Feb 2002 18:11:56 +1100 Received: (from makc@localhost) by kuku.melbourne.sgi.com (SGI-8.9.3/8.9.3) id SAA28466; Fri, 15 Feb 2002 18:11:55 +1100 (EST) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15468.46267.506874.296459@kuku.melbourne.sgi.com> Date: Fri, 15 Feb 2002 18:11:55 +1100 (EST) From: Max Matveev To: Sadanand Kota Cc: pcp@oss.sgi.com Subject: Re: 'No PMCD agent for domain of request' ? In-Reply-To: References: X-Mailer: VM 6.72 under 21.4 (patch 3) "Academic Rigor" XEmacs Lucid Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 551 Lines: 15 >>>>> "SK" == Sadanand Kota writes: SK> I have checked the pmcd.conf file to see for any conflicts and found SK> nothing. The message at the end of pmcd.log is 'Cleanup "xxx" agent SK> (dom 254): protocol failure for fd=14, exit(255)' where 'xxx' is name of SK> my PMDA. SK> Also, the installation scripts of PMDA does not give any error messages. Yes, but pmcd.log does - your PMDA has been started and then exited unexpectedly, pmcd notices it and cleans up, after that there is no PMDA to talk to whence the error. max From owner-pcp@oss.sgi.com Fri Feb 15 09:16:05 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1FHG5K29471 for pcp-outgoing; Fri, 15 Feb 2002 09:16:05 -0800 Received: from mail.teraport.de ([62.245.135.174]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1FHFs929453 for ; Fri, 15 Feb 2002 09:15:55 -0800 Received: from TeraPort.de ([10.10.12.32]) by mail.teraport.de (Lotus Domino Release 5.0.7) with ESMTP id 2002021517154739:908 ; Fri, 15 Feb 2002 17:15:47 +0100 Message-ID: <3C6D3433.A053B647@TeraPort.de> Date: Fri, 15 Feb 2002 17:15:47 +0100 From: Martin Knoblauch Reply-To: m.knoblauch@TeraPort.de Organization: TeraPort GmbH X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.18-pre8-K2-VM-24-preempt-lock i686) X-Accept-Language: en, de MIME-Version: 1.0 To: Mark Goodwin CC: Uncle Than , pcp@oss.sgi.com Subject: Re: Meminfo confusion References: X-MIMETrack: Itemize by SMTP Server on lotus/Teraport/de(Release 5.0.7 |March 21, 2001) at 02/15/2002 05:15:47 PM, Serialize by Router on lotus/Teraport/de(Release 5.0.7 |March 21, 2001) at 02/15/2002 05:15:55 PM, Serialize complete at 02/15/2002 05:15:55 PM Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 3361 Lines: 89 Mark Goodwin wrote: > > On Thu, 14 Feb 2002, Uncle Than wrote: > > > I had been under the assumption that > > > > mem.physmem = mem.util.free + mem.util.shared + mem.util.cached + > > mem.util.bufmem. > > Also notice physmem as reported in /proc/meminfo does not > correspond to real physical mem; it's almost the same, but does > not account for a small amount of mem reserved by the kernel. > A way to figure out the exact amount still eludes me .. anyone know? > I think (if we are talking about IA32 Systems with a BIOS), exact will be almost impossible. The first number in the "Mem:" line or the "MemTotal:" number are close, but not the physical memory. One solution could be to add up all lines with in /proc/mtrr with "write-back" in them. I checked it out on a few system. Those with kernels > 2.4.7-ac9 produce the right number. Earlier ones (2.4.2) produce complete nonsense :-( Also availability /proc/mtrr seems to be configuration dependent. But so is the whole /proc FS, of course. The kernel keeps a variable "max_mapnr" which seems to be the number of "mapped" physical memory pages. This is closer, but still 16, 32, 64, 128 or 256 KB off (I have seen all of these numbers). If you take this number and rond it to the next MB/GB, you probably have a very good estimate. I am running the following patch on top of 2.4.18pre8 just now on my 320 MB notebook: -xxx> diff -u fs/proc/proc_misc.c.orig fs/proc/proc_misc.c --- fs/proc/proc_misc.c.orig Fri Feb 15 15:28:32 2002 +++ fs/proc/proc_misc.c Fri Feb 15 15:32:00 2002 @@ -170,7 +170,8 @@ "LowTotal: %8lu kB\n" "LowFree: %8lu kB\n" "SwapTotal: %8lu kB\n" - "SwapFree: %8lu kB\n", + "SwapFree: %8lu kB\n" + "MaxMapped: %8lu kB\n", K(i.totalram), K(i.freeram), K(i.sharedram), @@ -184,7 +185,8 @@ K(i.totalram-i.totalhigh), K(i.freeram-i.freehigh), K(i.totalswap), - K(i.freeswap)); + K(i.freeswap), + K(max_mapnr)); return proc_calc_metrics(page, start, off, count, eof, len); #undef B The relevant lines on my Notebook show: MemTotal: 320876 kB => 313.355 MB MaxMapped: 327552 kB => 319.875 MB (128 KB difference) On a 1GB dula P-III System it loks like: MemTotal: 1029196 kB => 1005.074 MB MaxMapped: 1048560 kB => 1023.984 MB (16 KB difference) So it seems the number is pretty reasonable. Of course, it would be cool to find a place that accounts the missing bits. Should I try to submit that patch to the kernel and take the flaming about bloat :-) "dmesg" shows the following line after booting, which also has the max_mapnr number: Memory: 320644k/327552k available (1421k kernel code, 6520k reserved, 425k data, 232k init, 0k highmem) ----------------^^^^^^ Disadvantage is that it may get out of site after extended uptime when dmesg fills with other stuff. Martin -- ------------------------------------------------------------------ Martin Knoblauch | email: Martin.Knoblauch@TeraPort.de TeraPort GmbH | Phone: +49-89-510857-309 C+ITS | Fax: +49-89-510857-111 http://www.teraport.de | Mobile: +49-170-4904759 From owner-pcp@oss.sgi.com Fri Feb 15 14:03:56 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1FM3uH13508 for pcp-outgoing; Fri, 15 Feb 2002 14:03:56 -0800 Received: from sgi.com (sgi-too.SGI.COM [204.94.211.39]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1FM3o913505 for ; Fri, 15 Feb 2002 14:03:50 -0800 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via SMTP id NAA06272 for ; Fri, 15 Feb 2002 13:03:48 -0800 (PST) mail_from (markgw@sgi.com) Received: from sherman.melbourne.sgi.com (sherman.melbourne.sgi.com [134.14.55.232]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id IAA05852; Sat, 16 Feb 2002 08:02:23 +1100 Date: Sat, 16 Feb 2002 08:02:23 +1100 (EST) From: Mark Goodwin X-X-Sender: To: cc: Uncle Than , Subject: Re: Meminfo confusion In-Reply-To: <3C6D3433.A053B647@TeraPort.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 1797 Lines: 49 On Fri, 15 Feb 2002, Martin Knoblauch wrote: > Mark Goodwin wrote: > > > The kernel keeps a variable "max_mapnr" which seems to be the number of > "mapped" physical memory pages. This is closer, but still 16, 32, 64, > 128 or 256 KB off (I have seen all of these numbers). If you take this > number and rond it to the next MB/GB, you probably have a very good > estimate. Well I was looking for something more definite than "probably", but it sounds like it's a better number than existing MemTotal. However, if MemTotal is supposed to be "total physical RAM installed", then we should fix it so it reports the right number (if that is indeed possible). If the correct interpretation is "total memory available to the kernel and userland after the kernel has pinched a bit", then we need to add new field, say "PhysMem". > > MemTotal: 1029196 kB => 1005.074 MB > MaxMapped: 1048560 kB => 1023.984 MB (16 KB difference) > > So it seems the number is pretty reasonable. Of course, it would be > cool to find a place that accounts the missing bits. yes we should try and find the missing bits (a mystery to be solved). > Should I try to > submit that patch to the kernel and take the flaming about bloat :-) how about we work out the real answer, and then submit a patch? > > "dmesg" shows the following line after booting, which also has the > max_mapnr number: > > Memory: 320644k/327552k available (1421k kernel code, 6520k reserved, > 425k data, 232k init, 0k highmem) > ----------------^^^^^^ > > Disadvantage is that it may get out of site after extended uptime when > dmesg fills with other stuff. during init we can save something like physmem = (available+init+1024) >> 20 and then report it in /proc/meminfo as Physmem: in units of mbytes. thanks -- Mark From owner-pcp@oss.sgi.com Sat Feb 16 00:40:40 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1G8eeR24779 for pcp-outgoing; Sat, 16 Feb 2002 00:40:40 -0800 Received: from rigel.cis.ksu.edu (root@rigel.cis.ksu.edu [129.130.10.65]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1G8eX924776 for ; Sat, 16 Feb 2002 00:40:34 -0800 Received: from acrux.cis.ksu.edu (sada@acrux.cis.ksu.edu [129.130.10.32]) by rigel.cis.ksu.edu (8.9.1/8.9.1/000517) with ESMTP id BAA28189; Sat, 16 Feb 2002 01:40:31 -0600 (CST) Received: from localhost (sada@localhost) by acrux.cis.ksu.edu (8.9.1/8.9.1/000517) with ESMTP id BAA24643; Sat, 16 Feb 2002 01:40:31 -0600 (CST) X-Authentication-Warning: acrux.cis.ksu.edu: sada owned process doing -bs Date: Sat, 16 Feb 2002 01:40:31 -0600 (CST) From: Sadanand Kota To: Max Matveev cc: pcp@oss.sgi.com Subject: Re: 'No PMCD agent for domain of request' ? In-Reply-To: <15468.46267.506874.296459@kuku.melbourne.sgi.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 1602 Lines: 47 Thanks, But I dont think anything is wrong with the PMDA as the same PMDA is working on another machine with the same domain number(254) and PCP version number. Any idea? Sadanand On Fri, 15 Feb 2002, Max Matveev wrote: > >>>>> "SK" == Sadanand Kota writes: > > SK> I have checked the pmcd.conf file to see for any conflicts and found > SK> nothing. The message at the end of pmcd.log is 'Cleanup "xxx" agent > SK> (dom 254): protocol failure for fd=14, exit(255)' where 'xxx' is name of > SK> my PMDA. > > SK> Also, the installation scripts of PMDA does not give any error messages. > Yes, but pmcd.log does - your PMDA has been started and then exited > unexpectedly, pmcd notices it and cleans up, after that there is no > PMDA to talk to whence the error. > > max > ------------------ORIGINAL MESSAGE--------------------------- -----------------ORIGINAL MESSAGE ----------------------------- Hi, > I have written a PMDA to find the network characteristics of a process. > I have assigned it a domain number number 254 ( in domain.h). > I dont have any other PMDA with same number. Byt when I install the PMDA > and check it with 'pminfo' command, it gives the error message as ' No > PMCD agent for domain of request'. > > I have checked the pmcd.conf file to see for any conflicts and found > nothing. The message at the end of pmcd.log is 'Cleanup "xxx" agent > (dom 254): protocol failure for fd=14, exit(255)' where 'xxx' is name of > my PMDA. > > Also, the installation scripts of PMDA does not give any error messages. > > Any idea to correct it? > From owner-pcp@oss.sgi.com Sat Feb 16 04:49:20 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1GCnKL27547 for pcp-outgoing; Sat, 16 Feb 2002 04:49:20 -0800 Received: from nexus.adacel.com (shelob.adacel.com.au [203.36.26.146] (may be forged)) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1GCnF927543 for ; Sat, 16 Feb 2002 04:49:15 -0800 Received: (qmail 15859 invoked from network); 16 Feb 2002 11:36:53 -0000 Received: from unknown (HELO stratos) (144.136.16.193) by nexus.adacel.com with SMTP; 16 Feb 2002 11:36:53 -0000 Reply-To: From: "David Chatterton" To: "'Sadanand Kota'" Cc: Subject: RE: 'No PMCD agent for domain of request' ? Date: Sat, 16 Feb 2002 22:48:02 +1100 Message-ID: <000001c1b6df$cb514df0$c1108890@vic.bigpond.net.au> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook 8.5, Build 4.71.2377.0 In-Reply-To: Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6600 Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 1199 Lines: 37 > -----Original Message----- > From: owner-pcp@oss.sgi.com [mailto:owner-pcp@oss.sgi.com]On Behalf Of > Sadanand Kota > Sent: Saturday, February 16, 2002 6:41 PM > To: Max Matveev > Cc: pcp@oss.sgi.com > Subject: Re: 'No PMCD agent for domain of request' ? > > > Thanks, > But I dont think anything is wrong with the PMDA as the same PMDA is > working on another machine with the same domain number(254) and PCP > version number. > > Any idea? > The log entry indicates that pmcd is no longer connected to your agent, which is usually caused by your pmda terminating unexpectedly. The error message does not indicate anything is wrong with your domain number for the pmda. Firstly, check that the pmda process is still running. If it is, kill it and restart pmcd (and implicitly your pmda) and check the pmcd log and pmcd metrics to see if pmcd now has a connection to your agent. Then try to access your pmda's metrics with pminfo. > > Also, the installation scripts of PMDA does not give any > error messages. Your pmda may have been responding immediately after installation (since the install script usually checks shortly after restarting pmcd), but it may have died later... David From owner-pcp@oss.sgi.com Sat Feb 16 18:06:34 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1H26YE02144 for pcp-outgoing; Sat, 16 Feb 2002 18:06:34 -0800 Received: from zok.sgi.com (zok.SGI.COM [204.94.215.101]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1H26R902140 for ; Sat, 16 Feb 2002 18:06:27 -0800 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by zok.sgi.com (8.12.2/8.12.2/linux-outbound_gateway-1.2) with SMTP id g1H26LxV025550 for ; Sat, 16 Feb 2002 18:06:22 -0800 Received: from kenj-ppp.melbourne.sgi.com (kenj-ppp.melbourne.sgi.com [134.14.55.215]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id MAA11112; Sun, 17 Feb 2002 12:05:00 +1100 Date: Sun, 17 Feb 2002 12:04:58 +1100 (EST) From: kenmcd@melbourne.sgi.com Reply-To: kenmcd@melbourne.sgi.com To: Sadanand Kota cc: pcp@oss.sgi.com Subject: Re: 'No PMCD agent for domain of request' ? In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 1742 Lines: 52 In addition to the comments from everyone else ... 1. Check xxx.log in the same place where pmcd.log is being created ... this has to be a daemon PMDA, and by default they all use their name to create their own log file ... check also xxx.log.prev which may be the previous one if the problem is not deterministic. 2. Turn on PDU tracing in the PMDA ... add -D pdu to the arguments in pmcd.conf and restart the PMDA (SIGHUP to pmcd should suffice). The output will be in xxx.log. 3. Turn on PDU tracing in PMCD pmstore pmcd.control.debug 1 and then restart the PMDA. The additional output will be in pmcd.log. 4. Look for a core file in the directory where pmcd.log is created ... check if this is your PMDA. 5. Debug your PMDA with dbpma ... this will let you exercise very low-level interactions with PMDA independent of PMCD, but using the same message passing protocols (PDUs) as PMCD will use. If all else fails, post the pmcd and pmda log files from 2. and 3. above. On Fri, 15 Feb 2002, Sadanand Kota wrote: > Hi, > I have written a PMDA to find the network characteristics of a process. > I have assigned it a domain number number 254 ( in domain.h). > I dont have any other PMDA with same number. Byt when I install the PMDA > and check it with 'pminfo' command, it gives the error message as ' No > PMCD agent for domain of request'. > > I have checked the pmcd.conf file to see for any conflicts and found > nothing. The message at the end of pmcd.log is 'Cleanup "xxx" agent > (dom 254): protocol failure for fd=14, exit(255)' where 'xxx' is name of > my PMDA. > > Also, the installation scripts of PMDA does not give any error messages. > > Any idea to correct it? > > Sadanand > > > From owner-pcp@oss.sgi.com Sun Feb 17 18:45:59 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1I2jxG28294 for pcp-outgoing; Sun, 17 Feb 2002 18:45:59 -0800 Received: from rj.sgi.com (rj.SGI.COM [204.94.215.100]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1I2jv928291 for ; Sun, 17 Feb 2002 18:45:57 -0800 Received: from nodin.corp.sgi.com (nodin.corp.sgi.com [192.26.51.193]) by rj.sgi.com (8.12.2/8.12.2/linux-outbound_gateway-1.2) with ESMTP id g1I1jptm022313 for ; Sun, 17 Feb 2002 17:45:51 -0800 Received: from kao2.melbourne.sgi.com (kao2.melbourne.sgi.com [134.14.55.180]) by nodin.corp.sgi.com (8.11.4/8.11.2/nodin-1.0) with ESMTP id g1I1ioK26704813; Sun, 17 Feb 2002 17:44:50 -0800 (PST) Received: by kao2.melbourne.sgi.com (Postfix, from userid 16331) id 0529C3000BC; Mon, 18 Feb 2002 11:44:48 +1000 (EST) Received: from kao2.melbourne.sgi.com (localhost [127.0.0.1]) by kao2.melbourne.sgi.com (Postfix) with ESMTP id 8CB2416D; Mon, 18 Feb 2002 12:44:48 +1100 (EST) X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 From: Keith Owens To: Mark Goodwin Cc: Uncle Than , pcp@oss.sgi.com Subject: Re: Meminfo confusion In-reply-to: Your message of "Fri, 15 Feb 2002 10:26:09 +1100." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Mon, 18 Feb 2002 12:44:43 +1100 Message-ID: <11328.1013996683@kao2.melbourne.sgi.com> Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 612 Lines: 13 On Fri, 15 Feb 2002 10:26:09 +1100 (EST), Mark Goodwin wrote: >Also notice physmem as reported in /proc/meminfo does not >correspond to real physical mem; it's almost the same, but does >not account for a small amount of mem reserved by the kernel. >A way to figure out the exact amount still eludes me .. anyone know? ls -l /proc/kcore | awk '{printf("mem=%dM\n", ($5-4096)/1024/1024)}' Works for me on i386 and ia64. Have not tried it on discontiguous systems. It reports what memory the kernel can see, not what the machine has, which is exactly what we want for performance purposes. From owner-pcp@oss.sgi.com Sun Feb 17 19:01:54 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1I31sQ28933 for pcp-outgoing; Sun, 17 Feb 2002 19:01:54 -0800 Received: from pneumatic-tube.sgi.com (pneumatic-tube.sgi.com [204.94.214.22]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1I31n928926 for ; Sun, 17 Feb 2002 19:01:49 -0800 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by pneumatic-tube.sgi.com (980327.SGI.8.8.8-aspam/980310.SGI-aspam) via SMTP id SAA08803 for ; Sun, 17 Feb 2002 18:03:10 -0800 (PST) mail_from (markgw@sgi.com) Received: from sherman.melbourne.sgi.com (sherman.melbourne.sgi.com [134.14.55.232]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA16452; Mon, 18 Feb 2002 13:00:30 +1100 Date: Mon, 18 Feb 2002 13:00:30 +1100 (EST) From: Mark Goodwin X-X-Sender: To: Keith Owens cc: Subject: Re: Meminfo confusion In-Reply-To: <11328.1013996683@kao2.melbourne.sgi.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 926 Lines: 25 On Mon, 18 Feb 2002, Keith Owens wrote: > On Fri, 15 Feb 2002 10:26:09 +1100 (EST), > Mark Goodwin wrote: > >Also notice physmem as reported in /proc/meminfo does not > >correspond to real physical mem; it's almost the same, but does > >not account for a small amount of mem reserved by the kernel. > >A way to figure out the exact amount still eludes me .. anyone know? > > ls -l /proc/kcore | awk '{printf("mem=%dM\n", ($5-4096)/1024/1024)}' > > Works for me on i386 and ia64. Have not tried it on discontiguous > systems. It reports what memory the kernel can see, not what the > machine has, which is exactly what we want for performance purposes. > but not what we want for reporting machine h/w inventory, as needed for the hinv.physmem PCP metric. Running this on sherman (2G RAM), this is way off: sherman 1% ls -l /proc/kcore | awk '{printf("mem=%dM\n", ($5-4096)/1024/1024)}' mem=896M From owner-pcp@oss.sgi.com Sun Feb 17 20:13:41 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1I4DfT04996 for pcp-outgoing; Sun, 17 Feb 2002 20:13:41 -0800 Received: from rj.sgi.com (rj.SGI.COM [204.94.215.100]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1I4Db904992 for ; Sun, 17 Feb 2002 20:13:37 -0800 Received: from nodin.corp.sgi.com (nodin.corp.sgi.com [192.26.51.193]) by rj.sgi.com (8.12.2/8.12.2/linux-outbound_gateway-1.2) with ESMTP id g1I3DVtm023409 for ; Sun, 17 Feb 2002 19:13:31 -0800 Received: from kao2.melbourne.sgi.com (kao2.melbourne.sgi.com [134.14.55.180]) by nodin.corp.sgi.com (8.11.4/8.11.2/nodin-1.0) with ESMTP id g1I3CUK30991898; Sun, 17 Feb 2002 19:12:30 -0800 (PST) Received: by kao2.melbourne.sgi.com (Postfix, from userid 16331) id 584EE3000BC; Mon, 18 Feb 2002 13:12:29 +1000 (EST) Received: from kao2.melbourne.sgi.com (localhost [127.0.0.1]) by kao2.melbourne.sgi.com (Postfix) with ESMTP id 1FEA416D; Mon, 18 Feb 2002 14:12:29 +1100 (EST) X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 From: Keith Owens To: Mark Goodwin Cc: pcp@oss.sgi.com Subject: Re: Meminfo confusion In-reply-to: Your message of "Mon, 18 Feb 2002 13:00:30 +1100." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Mon, 18 Feb 2002 14:12:23 +1100 Message-ID: <11577.1014001943@kao2.melbourne.sgi.com> Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 1213 Lines: 29 On Mon, 18 Feb 2002 13:00:30 +1100 (EST), Mark Goodwin wrote: >On Mon, 18 Feb 2002, Keith Owens wrote: > >> On Fri, 15 Feb 2002 10:26:09 +1100 (EST), >> Mark Goodwin wrote: >> >Also notice physmem as reported in /proc/meminfo does not >> >correspond to real physical mem; it's almost the same, but does >> >not account for a small amount of mem reserved by the kernel. >> >A way to figure out the exact amount still eludes me .. anyone know? >> >> ls -l /proc/kcore | awk '{printf("mem=%dM\n", ($5-4096)/1024/1024)}' >> >> Works for me on i386 and ia64. Have not tried it on discontiguous >> systems. It reports what memory the kernel can see, not what the >> machine has, which is exactly what we want for performance purposes. >> > >but not what we want for reporting machine h/w inventory, >as needed for the hinv.physmem PCP metric. > >Running this on sherman (2G RAM), this is way off: >sherman 1% ls -l /proc/kcore | awk '{printf("mem=%dM\n", ($5-4096)/1024/1024)}' >mem=896M Because sherman is running a kernel that was not compiled for highmem. That restricts the kernel to 896M of physical memory, the value is correct. I will recompile sherman for highmem. From owner-pcp@oss.sgi.com Mon Feb 18 00:46:17 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1I8kHQ12567 for pcp-outgoing; Mon, 18 Feb 2002 00:46:17 -0800 Received: from mail.teraport.de ([62.245.135.174]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1I8kA912564 for ; Mon, 18 Feb 2002 00:46:10 -0800 Received: from TeraPort.de ([10.10.12.32]) by mail.teraport.de (Lotus Domino Release 5.0.7) with ESMTP id 2002021808460233:1220 ; Mon, 18 Feb 2002 08:46:02 +0100 Message-ID: <3C70B13A.83039DFF@TeraPort.de> Date: Mon, 18 Feb 2002 08:46:02 +0100 From: Martin Knoblauch Reply-To: m.knoblauch@TeraPort.de Organization: TeraPort GmbH X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.18-pre8-K2-VM-24-preempt-lock i686) X-Accept-Language: en, de MIME-Version: 1.0 To: Mark Goodwin CC: m.knoblauch@TeraPort.de, Uncle Than , pcp@oss.sgi.com Subject: Re: Meminfo confusion References: X-MIMETrack: Itemize by SMTP Server on lotus/Teraport/de(Release 5.0.7 |March 21, 2001) at 02/18/2002 08:46:02 AM, Serialize by Router on lotus/Teraport/de(Release 5.0.7 |March 21, 2001) at 02/18/2002 08:46:10 AM, Serialize complete at 02/18/2002 08:46:10 AM Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 2114 Lines: 54 Mark Goodwin wrote: > > On Fri, 15 Feb 2002, Martin Knoblauch wrote: > > > Mark Goodwin wrote: > > > > > The kernel keeps a variable "max_mapnr" which seems to be the number of > > "mapped" physical memory pages. This is closer, but still 16, 32, 64, > > 128 or 256 KB off (I have seen all of these numbers). If you take this > > number and rond it to the next MB/GB, you probably have a very good > > estimate. > > Well I was looking for something more definite than "probably", but > it sounds like it's a better number than existing MemTotal. However, > if MemTotal is supposed to be "total physical RAM installed", then we I don't think it is supposed/guaranteed to be the physical memory installed. To me it looks more like "total memory available to this running kernel". Small :-) difference on ia32. > should fix it so it reports the right number (if that is indeed possible). > If the correct interpretation is "total memory available to the kernel > and userland after the kernel has pinched a bit", then we need to add > new field, say "PhysMem". > correct. We need to find the bits and then report PhysMem. > > > > MemTotal: 1029196 kB => 1005.074 MB > > MaxMapped: 1048560 kB => 1023.984 MB (16 KB difference) > > > > So it seems the number is pretty reasonable. Of course, it would be > > cool to find a place that accounts the missing bits. > > yes we should try and find the missing bits (a mystery to be solved). > > > Should I try to > > submit that patch to the kernel and take the flaming about bloat :-) > > how about we work out the real answer, and then submit a patch? > Agreed. What about the "mtrr" suggestion (for IA32)? On the systems I checked, the sum of "write-back" entries was either correct, or complete nonsense (on older 2.4 kernels). Martin -- ------------------------------------------------------------------ Martin Knoblauch | email: Martin.Knoblauch@TeraPort.de TeraPort GmbH | Phone: +49-89-510857-309 C+ITS | Fax: +49-89-510857-111 http://www.teraport.de | Mobile: +49-170-4904759 From owner-pcp@oss.sgi.com Mon Feb 18 00:50:52 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1I8oqY12635 for pcp-outgoing; Mon, 18 Feb 2002 00:50:52 -0800 Received: from mail.teraport.de ([62.245.135.174]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1I8oh912631 for ; Mon, 18 Feb 2002 00:50:44 -0800 Received: from TeraPort.de ([10.10.12.32]) by mail.teraport.de (Lotus Domino Release 5.0.7) with ESMTP id 2002021808503521:1222 ; Mon, 18 Feb 2002 08:50:35 +0100 Message-ID: <3C70B24B.401F9168@TeraPort.de> Date: Mon, 18 Feb 2002 08:50:35 +0100 From: Martin Knoblauch Reply-To: m.knoblauch@TeraPort.de Organization: TeraPort GmbH X-Mailer: Mozilla 4.78 [en] (X11; U; Linux 2.4.18-pre8-K2-VM-24-preempt-lock i686) X-Accept-Language: en, de MIME-Version: 1.0 To: m.knoblauch@TeraPort.de CC: Keith Owens , pcp@oss.sgi.com Subject: Re: Meminfo confusion References: <11577.1014001943@kao2.melbourne.sgi.com> <3C70AFCD.11523550@TeraPort.de> X-MIMETrack: Itemize by SMTP Server on lotus/Teraport/de(Release 5.0.7 |March 21, 2001) at 02/18/2002 08:50:35 AM, Serialize by Router on lotus/Teraport/de(Release 5.0.7 |March 21, 2001) at 02/18/2002 08:50:44 AM, Serialize complete at 02/18/2002 08:50:44 AM Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 2280 Lines: 53 Martin Knoblauch wrote: > > Keith Owens wrote: > > > > On Mon, 18 Feb 2002 13:00:30 +1100 (EST), > > Mark Goodwin wrote: > > >On Mon, 18 Feb 2002, Keith Owens wrote: > > > > > >> On Fri, 15 Feb 2002 10:26:09 +1100 (EST), > > >> Mark Goodwin wrote: > > >> >Also notice physmem as reported in /proc/meminfo does not > > >> >correspond to real physical mem; it's almost the same, but does > > >> >not account for a small amount of mem reserved by the kernel. > > >> >A way to figure out the exact amount still eludes me .. anyone know? > > >> > > >> ls -l /proc/kcore | awk '{printf("mem=%dM\n", ($5-4096)/1024/1024)}' > > >> > > >> Works for me on i386 and ia64. Have not tried it on discontiguous > > >> systems. It reports what memory the kernel can see, not what the > > >> machine has, which is exactly what we want for performance purposes. > > >> > > > > > >but not what we want for reporting machine h/w inventory, > > >as needed for the hinv.physmem PCP metric. > > > > > >Running this on sherman (2G RAM), this is way off: > > >sherman 1% ls -l /proc/kcore | awk '{printf("mem=%dM\n", ($5-4096)/1024/1024)}' > > >mem=896M > > > > Because sherman is running a kernel that was not compiled for highmem. > > That restricts the kernel to 896M of physical memory, the value is > > correct. I will recompile sherman for highmem. > > Hmm. I have seen this 896MB reporting on kernels with 1GB and more and > HIHMEM support compiled in (as shown by "free"). > Oops, message did not go to the list. Monday morning caffeine deprivation. I also, as already remarked, the kproc solution is not giving the answer to "total physical memory". The numbers are basically in the same ballpark at the MemTotal from meminfo. So far the best I have seen is "max_mapnr" (guaranteed to be in the kernel), or the sum of all mtrr entries with write-back property (not guaranteed to be available, older 2.4 kernels report complete nonsense). Martin -- ------------------------------------------------------------------ Martin Knoblauch | email: Martin.Knoblauch@TeraPort.de TeraPort GmbH | Phone: +49-89-510857-309 C+ITS | Fax: +49-89-510857-111 http://www.teraport.de | Mobile: +49-170-4904759 From owner-pcp@oss.sgi.com Mon Feb 18 10:40:14 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1IIeEb07348 for pcp-outgoing; Mon, 18 Feb 2002 10:40:14 -0800 Received: from mail5.speakeasy.net (mail5.speakeasy.net [216.254.0.205]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1IIe3907341 for ; Mon, 18 Feb 2002 10:40:04 -0800 Received: (qmail 2729 invoked from network); 18 Feb 2002 17:39:56 -0000 Received: from unknown (HELO PATCHES) ([216.254.47.170]) (envelope-sender ) by mail5.speakeasy.net (qmail-ldap-1.03) with SMTP for ; 18 Feb 2002 17:39:56 -0000 From: "Corey Cole" To: "SGI-PCP" Subject: RE: Meminfo confusion Date: Mon, 18 Feb 2002 10:40:00 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) In-Reply-To: <3C70B24B.401F9168@TeraPort.de> Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 2913 Lines: 71 While this probably isn't gonna work on every x86 machine ever made, why not go into the DMI info in the BIOS? Alan Cox has some GPL'd code here: ftp://ftp.uk.linux.org/pub/linux/alan/DMI/dmidecode.c that can look in the DMI table. Running it on my Oracle machine comes up with not only the total memory but even what slots are filled, whether the RAM is single or double sided, etc. -----Original Message----- From: owner-pcp@oss.sgi.com [mailto:owner-pcp@oss.sgi.com]On Behalf Of Martin Knoblauch Sent: Monday, February 18, 2002 12:51 AM To: m.knoblauch@TeraPort.de Cc: Keith Owens; pcp@oss.sgi.com Subject: Re: Meminfo confusion Martin Knoblauch wrote: > > Keith Owens wrote: > > > > On Mon, 18 Feb 2002 13:00:30 +1100 (EST), > > Mark Goodwin wrote: > > >On Mon, 18 Feb 2002, Keith Owens wrote: > > > > > >> On Fri, 15 Feb 2002 10:26:09 +1100 (EST), > > >> Mark Goodwin wrote: > > >> >Also notice physmem as reported in /proc/meminfo does not > > >> >correspond to real physical mem; it's almost the same, but does > > >> >not account for a small amount of mem reserved by the kernel. > > >> >A way to figure out the exact amount still eludes me .. anyone know? > > >> > > >> ls -l /proc/kcore | awk '{printf("mem=%dM\n", ($5-4096)/1024/1024)}' > > >> > > >> Works for me on i386 and ia64. Have not tried it on discontiguous > > >> systems. It reports what memory the kernel can see, not what the > > >> machine has, which is exactly what we want for performance purposes. > > >> > > > > > >but not what we want for reporting machine h/w inventory, > > >as needed for the hinv.physmem PCP metric. > > > > > >Running this on sherman (2G RAM), this is way off: > > >sherman 1% ls -l /proc/kcore | awk '{printf("mem=%dM\n", ($5-4096)/1024/1024)}' > > >mem=896M > > > > Because sherman is running a kernel that was not compiled for highmem. > > That restricts the kernel to 896M of physical memory, the value is > > correct. I will recompile sherman for highmem. > > Hmm. I have seen this 896MB reporting on kernels with 1GB and more and > HIHMEM support compiled in (as shown by "free"). > Oops, message did not go to the list. Monday morning caffeine deprivation. I also, as already remarked, the kproc solution is not giving the answer to "total physical memory". The numbers are basically in the same ballpark at the MemTotal from meminfo. So far the best I have seen is "max_mapnr" (guaranteed to be in the kernel), or the sum of all mtrr entries with write-back property (not guaranteed to be available, older 2.4 kernels report complete nonsense). Martin -- ------------------------------------------------------------------ Martin Knoblauch | email: Martin.Knoblauch@TeraPort.de TeraPort GmbH | Phone: +49-89-510857-309 C+ITS | Fax: +49-89-510857-111 http://www.teraport.de | Mobile: +49-170-4904759 From owner-pcp@oss.sgi.com Mon Feb 18 13:54:27 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1ILsR611513 for pcp-outgoing; Mon, 18 Feb 2002 13:54:27 -0800 Received: from mail12.speakeasy.net (mail12.speakeasy.net [216.254.0.212]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1ILsM911508 for ; Mon, 18 Feb 2002 13:54:23 -0800 Received: (qmail 13392 invoked from network); 18 Feb 2002 20:54:16 -0000 Received: from unknown (HELO PATCHES) ([216.254.47.170]) (envelope-sender ) by mail12.speakeasy.net (qmail-ldap-1.03) with SMTP for ; 18 Feb 2002 20:54:16 -0000 From: "Corey Cole" To: "SGI-PCP" Subject: Solaris port Date: Mon, 18 Feb 2002 13:54:20 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 902 Lines: 25 Using Phillip Ezolt's Tru64 patch as a starting point, I've got a sort-of working PCP setup for Solaris 8. As for the sort-of part, here's what's still b0rked. Solaris 8's '/bin/sh' is very broke, so for a while I was using the Korn Shell. When the last patch cluster broke even ksh, I had to resort to bash. As a result, install is very very broken. I haven't started on the Solaris specific PMDA yet, and a number of the provided PMDAs (cisco, roomtemp, shping) don't compile because of one missing header/function or another. Finally, I'm trying to track down an issue with floating point numbers displaying incorrectly. I had lots of trouble with all the *_MAX macros, so that's what I'm going to check first. Is there any interest in me providing a current patch? Or would [any|every]one rather wait until I get something that at least prints FP numbers correctly? Regards, Corey Cole From owner-pcp@oss.sgi.com Mon Feb 18 15:12:07 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1INC7J15586 for pcp-outgoing; Mon, 18 Feb 2002 15:12:07 -0800 Received: from yog-sothoth.sgi.com (eugate.sgi.com [192.48.160.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1INC2915583 for ; Mon, 18 Feb 2002 15:12:02 -0800 Received: from rattle.melbourne.sgi.com (rattle.melbourne.sgi.com [134.14.55.145]) by yog-sothoth.sgi.com (980305.SGI.8.8.8-aspam-6.2/980304.SGI-aspam-europe) via ESMTP id XAA58250 for ; Mon, 18 Feb 2002 23:12:06 +0100 (CET) mail_from (kenmcd@melbourne.sgi.com) Received: from localhost (kenmcd@localhost) by rattle.melbourne.sgi.com (SGI-8.9.3/8.9.3) with ESMTP id JAA81232; Tue, 19 Feb 2002 09:10:40 +1100 (EST) X-Authentication-Warning: rattle.melbourne.sgi.com: kenmcd owned process doing -bs Date: Tue, 19 Feb 2002 09:10:40 +1100 From: Ken McDonell Reply-To: kenmcd@sgi.com To: Corey Cole cc: SGI-PCP Subject: Re: Solaris port In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 1194 Lines: 33 I think it would be better to wait until you have something closer to working ... but I'd encourage any others working on PCP for Solaris to communicate with Corey so you can avoid any doubling up of effort. On Mon, 18 Feb 2002, Corey Cole wrote: > Using Phillip Ezolt's Tru64 patch as a starting point, I've got > a sort-of working PCP setup for Solaris 8. > > As for the sort-of part, here's what's still b0rked. > > Solaris 8's '/bin/sh' is very broke, so for a while I was using > the Korn Shell. When the last patch cluster broke even ksh, I had > to resort to bash. As a result, install is very very broken. > > I haven't started on the Solaris specific PMDA yet, and a number of > the provided PMDAs (cisco, roomtemp, shping) don't compile because of > one missing header/function or another. > > Finally, I'm trying to track down an issue with floating point numbers > displaying incorrectly. I had lots of trouble with all the *_MAX macros, > so that's what I'm going to check first. > > Is there any interest in me providing a current patch? Or would > [any|every]one > rather wait until I get something that at least prints FP numbers correctly? > > Regards, > > Corey Cole > From owner-pcp@oss.sgi.com Tue Feb 26 12:56:26 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1QKuQW03632 for pcp-outgoing; Tue, 26 Feb 2002 12:56:26 -0800 Received: from mail11.speakeasy.net (mail11.speakeasy.net [216.254.0.211]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1QKuC903627 for ; Tue, 26 Feb 2002 12:56:13 -0800 Received: (qmail 12805 invoked from network); 26 Feb 2002 19:56:06 -0000 Received: from unknown (HELO PATCHES) ([216.254.47.170]) (envelope-sender ) by mail11.speakeasy.net (qmail-ldap-1.03) with SMTP for ; 26 Feb 2002 19:56:06 -0000 From: "Corey Cole" To: "SGI-PCP" Subject: Solaris port assistance Date: Tue, 26 Feb 2002 12:56:14 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 4463 Lines: 108 I'm stuck and I'd like some assistance. At this point, the client apps build, but I'm having issues with extracting/displaying values of the types PM_TYPE_U64, PM_TYPE_FLOAT, and PM_TYPE_DOUBLE. As my test procedure, I have a linux box side by side with my Solaris machine interrogating a second linux box's PMCD. I've checked the debug output at the -D1 level and the PDUs match each other exactly (with the exception of the timestamp bytes, but that's expected). For example, I'll fetch disk.all.write (a fairly static value on the target machine). I get 1261 (4ED) on the linux box, but 17078775686895763456 (ED04000000000000) on the Solaris machine. Initially I thought I had run up against a gcc/Solaris bug in which bitwise ands against ull members of unions are hosed. But those areas in "interp.c" are only in ifdefs that don't show up on Solaris (i.e. HAVE_CAST_U64_DOUBLE is defined). Here's the raw capture of pminfo on the Solaris and Linux machines if someone could take a look and suggest where I need to look... Solaris: ccole@tommy-toes> ./pminfo -D1 -f -h sophie disk.all.write [9638]pmGetPDU: ERROR fd=4 len=20 from=239 moreinput? no 000: 14 7000 ef 0 2010000 [9638]pmXmitPDU: CREDS fd=4 len=20 000: 14 700c 25a6 1 1020000 [9638]pmXmitPDU: PMNS_TRAVERSE fd=4 len=36 000: 24 7010 25a6 0 e 6469736b 2e616c6c 2e777269 008: 74650000 [9638]pmGetPDU: PMNS_NAMES fd=4 len=44 from=239 moreinput? no 000: 2c 700e ef f 0 1 e 6469736b 008: 2e616c6c 2e777269 74657e7e [9638]pmXmitPDU: PMNS_NAMES fd=4 len=44 000: 2c 700e 25a6 f 0 1 e 6469736b 008: 2e616c6c 2e777269 74657e7e [9638]pmGetPDU: PMNS_IDS fd=4 len=24 from=239 moreinput? no 000: 18 700d ef 1 1 f000019 [9638]pmXmitPDU: PROFILE fd=4 len=28 000: 1c 7002 25a6 0 0 0 0 [9638]pmXmitPDU: FETCH fd=4 len=32 000: 20 7003 25a6 0 0 0 1 f000019 [9638]pmGetPDU: RESULT fd=4 len=56 from=239 moreinput? no 000: 38 7001 ef 3c7acdce bda5c 1 f000019 1 008: 1 ffffffff b 300000c 0 4ed pmResult dump from 0x253e0 timestamp: 1014681038.776796 06:50:38.776 numpmid: 1 60.0.25 (disk.all.write): numval: 1 valfmt: 1 vlist[]: value 17078775686895763456 [9638]pmXmitPDU: DESC_REQ fd=4 len=16 000: 10 7004 25a6 f000019 [9638]pmGetPDU: DESC fd=4 len=32 from=239 moreinput? no 000: 20 7005 ef f000019 3 ffffffff 1 100000 disk.all.write value 17078775686895763456 Linux: tigger:~# pminfo -D1 -f -h sophie disk.all.write [3616]pmGetPDU: ERROR fd=3 len=20 from=239 moreinput? no 000: 14 7000 ef 0 102 [3616]pmXmitPDU: CREDS fd=3 len=20 000: 14 700c e20 1000000 201 [3616]pmXmitPDU: PMNS_TRAVERSE fd=3 len=36 000: 24 7010 e20 0 e000000 6b736964 6c6c612e 6972772e 008: 6574 [3616]pmGetPDU: PMNS_NAMES fd=3 len=44 from=239 moreinput? no 000: 2c 700e ef f000000 0 1000000 e000000 6b736964 008: 6c6c612e 6972772e 7e7e6574 [3616]pmXmitPDU: PMNS_NAMES fd=3 len=44 000: 2c 700e e20 f000000 0 1000000 e000000 6b736964 008: 6c6c612e 6972772e 7e7e6574 [3616]pmGetPDU: PMNS_IDS fd=3 len=24 from=239 moreinput? no 000: 18 700d ef 1000000 1000000 1900000f [3616]pmXmitPDU: PROFILE fd=3 len=28 000: 1c 7002 e20 0 0 0 0 [3616]pmXmitPDU: FETCH fd=3 len=32 000: 20 7003 e20 0 0 0 1000000 1900000f [3616]pmGetPDU: RESULT fd=3 len=56 from=239 moreinput? no 000: 38 7001 ef c0cd7a3c 865e0100 1000000 1900000f 1000000 008: 1000000 ffffffff b000000 c000003 0 ed040000 pmResult dump from 0x804d238 timestamp: 1014681024.089734 16:50:24.089 numpmid: 1 60.0.25 (disk.all.write): numval: 1 valfmt: 1 vlist[]: value 1261 [3616]pmXmitPDU: DESC_REQ fd=3 len=16 000: 10 7004 e20 1900000f [3616]pmGetPDU: DESC fd=3 len=32 from=239 moreinput? no 000: 20 7005 ef 1900000f 3000000 ffffffff 1000000 1000 disk.all.write value 1261 (sophie is the target host, tigger is the linux test platform, and tommy-toes is the Solaris build machine...) Thanks, Corey From owner-pcp@oss.sgi.com Wed Feb 27 09:25:28 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1RHPSl09880 for pcp-outgoing; Wed, 27 Feb 2002 09:25:28 -0800 Received: from zok.sgi.com (zok.sgi.com [204.94.215.101]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1RHPP909877 for ; Wed, 27 Feb 2002 09:25:25 -0800 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by zok.sgi.com (8.12.2/8.12.2/linux-outbound_gateway-1.2) with SMTP id g1R8qUFH028985 for ; Wed, 27 Feb 2002 00:52:31 -0800 Received: from kuku.melbourne.sgi.com (kuku.melbourne.sgi.com [134.14.55.163]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id SAA21241; Wed, 27 Feb 2002 18:51:13 +1100 Received: (from makc@localhost) by kuku.melbourne.sgi.com (SGI-8.9.3/8.9.3) id SAA24212; Wed, 27 Feb 2002 18:51:12 +1100 (EST) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15484.36848.781061.500847@kuku.melbourne.sgi.com> Date: Wed, 27 Feb 2002 18:51:12 +1100 From: Max Matveev To: "Corey Cole" Cc: "SGI-PCP" Subject: Re: Solaris port assistance In-Reply-To: References: X-Mailer: VM 7.00 under 21.4 (patch 3) "Academic Rigor" XEmacs Lucid Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 693 Lines: 17 >>>>> "CC" == Corey Cole writes: CC> For example, I'll fetch disk.all.write (a fairly static value on CC> the target machine). I get 1261 (4ED) on the linux box, but CC> 17078775686895763456 (ED04000000000000) on the Solaris machine. Welcome to the endianess hell - you might want to write your own __htonll and __htonf routines to make sure that stuff you put of the wire and stuff you read from the wire matches internal Solaris represenation. Start with stuffing a value like 0x123456789ABCDEF0 into Solaris long long and print using %llx or some other solaris incation for long long prints. Repeat on Linux, figure out the differences and write routines. max From owner-pcp@oss.sgi.com Wed Feb 27 12:18:01 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1RKI1Z15938 for pcp-outgoing; Wed, 27 Feb 2002 12:18:01 -0800 Received: from mail5.speakeasy.net (mail5.speakeasy.net [216.254.0.205]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1RKHu915934 for ; Wed, 27 Feb 2002 12:17:56 -0800 Received: (qmail 2441 invoked from network); 27 Feb 2002 19:17:51 -0000 Received: from unknown (HELO PATCHES) ([216.254.47.170]) (envelope-sender ) by mail5.speakeasy.net (qmail-ldap-1.03) with SMTP for ; 27 Feb 2002 19:17:51 -0000 From: "Corey Cole" To: "SGI-PCP" Subject: RE: Solaris port assistance Date: Wed, 27 Feb 2002 12:18:01 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) In-Reply-To: <15484.36848.781061.500847@kuku.melbourne.sgi.com> Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 1510 Lines: 40 I'd assumed that since SGI-MIPS machines were big-endian and x86 (and Alpha) machines were little-endian that this issue had already been solved. Not only that, but I believe there's also been a successful build on Sparc-Linux. Then again, what is it they say about assume? ;) What bothers me is that int32 and uint32 values work fine, but float, which is also 32 bits on Solaris, doesn't. I mean, I can understand U64 and double, but why float? I broke down and ordered Sun's Solaris porting guide, hoping that might have a tidbit or two. Until then, I'm checking out Sun white papers... -----Original Message----- From: owner-pcp@oss.sgi.com [mailto:owner-pcp@oss.sgi.com]On Behalf Of Max Matveev Sent: Wednesday, February 27, 2002 12:51 AM To: Corey Cole Cc: SGI-PCP Subject: Re: Solaris port assistance >>>>> "CC" == Corey Cole writes: CC> For example, I'll fetch disk.all.write (a fairly static value on CC> the target machine). I get 1261 (4ED) on the linux box, but CC> 17078775686895763456 (ED04000000000000) on the Solaris machine. Welcome to the endianess hell - you might want to write your own __htonll and __htonf routines to make sure that stuff you put of the wire and stuff you read from the wire matches internal Solaris represenation. Start with stuffing a value like 0x123456789ABCDEF0 into Solaris long long and print using %llx or some other solaris incation for long long prints. Repeat on Linux, figure out the differences and write routines. max From owner-pcp@oss.sgi.com Wed Feb 27 13:40:53 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g1RLerE17825 for pcp-outgoing; Wed, 27 Feb 2002 13:40:53 -0800 Received: from mail12.speakeasy.net (mail12.speakeasy.net [216.254.0.212]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g1RLej917821 for ; Wed, 27 Feb 2002 13:40:45 -0800 Received: (qmail 20741 invoked from network); 27 Feb 2002 20:40:39 -0000 Received: from unknown (HELO PATCHES) ([216.254.47.170]) (envelope-sender ) by mail12.speakeasy.net (qmail-ldap-1.03) with SMTP for ; 27 Feb 2002 20:40:39 -0000 From: "Corey Cole" To: "SGI-PCP" Subject: RE: Solaris port assistance Date: Wed, 27 Feb 2002 13:40:49 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) Importance: Normal In-Reply-To: X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Sender: owner-pcp@oss.sgi.com Precedence: bulk Content-Length: 2443 Lines: 67 Probably bad form to reply to my own message, but it looks like I found the problem. Where many OSen define byte order in endian.h or sys/endian.h, Solaris puts it in sys/byteorder.h (which then uses sys/isa_defs.h to check for sparc v[7|8|9] or i386 or ia64). Not only that, but they put a leading underscore on BIG_ENDIAN. Anyways, adding a check for sys/byteorder.h and adding some more magic to platform_defs.h.in fixed the problem. Now all that's left is to work out the install issues and add the ability to create Solaris "pkg" files. Then I'll confirm I don't break anything on Linux and submit the patch (I'm still working on getting my Indy working...) Creating Solaris pmdas will be a task for another time ;) -----Original Message----- From: owner-pcp@oss.sgi.com [mailto:owner-pcp@oss.sgi.com]On Behalf Of Corey Cole Sent: Wednesday, February 27, 2002 12:18 PM To: SGI-PCP Subject: RE: Solaris port assistance I'd assumed that since SGI-MIPS machines were big-endian and x86 (and Alpha) machines were little-endian that this issue had already been solved. Not only that, but I believe there's also been a successful build on Sparc-Linux. Then again, what is it they say about assume? ;) What bothers me is that int32 and uint32 values work fine, but float, which is also 32 bits on Solaris, doesn't. I mean, I can understand U64 and double, but why float? I broke down and ordered Sun's Solaris porting guide, hoping that might have a tidbit or two. Until then, I'm checking out Sun white papers... -----Original Message----- From: owner-pcp@oss.sgi.com [mailto:owner-pcp@oss.sgi.com]On Behalf Of Max Matveev Sent: Wednesday, February 27, 2002 12:51 AM To: Corey Cole Cc: SGI-PCP Subject: Re: Solaris port assistance >>>>> "CC" == Corey Cole writes: CC> For example, I'll fetch disk.all.write (a fairly static value on CC> the target machine). I get 1261 (4ED) on the linux box, but CC> 17078775686895763456 (ED04000000000000) on the Solaris machine. Welcome to the endianess hell - you might want to write your own __htonll and __htonf routines to make sure that stuff you put of the wire and stuff you read from the wire matches internal Solaris represenation. Start with stuffing a value like 0x123456789ABCDEF0 into Solaris long long and print using %llx or some other solaris incation for long long prints. Repeat on Linux, figure out the differences and write routines. max