pcp
[Top] [All Lists]

RE: disk.dev.avactive and disk.dev.aveq on RH Advanced Server

To: "Davis, Todd C" <todd.c.davis@xxxxxxxxx>
Subject: RE: disk.dev.avactive and disk.dev.aveq on RH Advanced Server
From: kenmcd@xxxxxxxxxxxxxxxxx
Date: Fri, 21 Mar 2003 06:01:55 +1100 (EST)
Cc: "'pcp@xxxxxxxxxxx'" <pcp@xxxxxxxxxxx>
In-reply-to: <29AD895CE780D511A8870002A50A666D04F90988@hdsmsx106.hd.intel.com>
Reply-to: kenmcd@xxxxxxxxxxxxxxxxx
Sender: pcp-bounce@xxxxxxxxxxx
I should read _all_ my mail backlog before responding.

Yep Todd, this looks busted.  But I believe the Linux PMDA conversion
dates from some version of the sard patch (or similar) in the past that
did _not_ have the prior conversion in the times.  I'll snoop around
some and see if it looks safe to rip the conversion out of the Linux
PMDA.  I've checked locally and the numbers are definitely too high
by a factor of 10 on Linux 2.4.20 + xfs ... I've opened a bug to
track this within SGI ... we'll spin the first "dev" version of PCP 2.5
with the fix for this.

However, none of this changes my basic belief that disk.dev.aveq is
not helpful in monitoring disk performance ... contrary to what I said
earlier, disk.dev.avactive is useful and we need this to be correct.

As to the second ull question, the basic guidance is that if the native
instrumentation is 32-bit ctrs, and you start to add these up (as in
disk.all.x = sum (disk.dev.x)) then the sum should be 64-bit to avoid
overflows.  Similiar things happen when you add reads + writes to give
totals, or convert from blocks to Kbytes, or ticks to msec.  I'll review
the data types for the disk metrics to make sure this is correct.

Thanks Todd.

On Wed, 19 Mar 2003, Davis, Todd C wrote:

> I think I found part of the bug for these metrics. The metrics are being
> converted to milliseconds twice, in the sard patch and in the Linux pmda:
> 
> >From sard patch:
> +#define MSEC(x) ((x) * 1000 / HZ)
> +                                     MSEC(hd->rd_ticks),
> +                                     hd->wr_ios, hd->wr_merges,
> +                                     hd->wr_sectors,
> +                                     MSEC(hd->wr_ticks),
> +                                     hd->ios_in_flight,
> +                                     MSEC(hd->io_ticks),
> +                                     MSEC(hd->aveq));
> +#undef MSEC
> 
> >From the Linux pmda:
>               case 46: /* disk.dev.avactive */
>                   atom->ul = 1000 * p->io_ticks / proc_stat.hz;
>                   break;
>               case 47: /* disk.dev.aveq */
>                   atom->ul = 1000 * p->aveq / proc_stat.hz;
>                   break;
> 
> 
>                   case 44: /* disk.all.avactive */
>                       atom->ull += 1000 * p->io_ticks / proc_stat.hz;
>                       break;
>                   case 45: /* disk.all.aveq */
>                       atom->ull += 1000 * p->aveq / proc_stat.hz;
>                       break;
> 
> Also shouldn't all the metrics be ull?
> 
> Todd C. Davis
> These are my opinions and absolutely not official opinions of Intel Corp.
>  
> -----Original Message-----
> From: Davis, Todd C 
> Sent: Tuesday, March 18, 2003 11:36 AM
> To: 'pcp@xxxxxxxxxxx'
> Subject: disk.dev.avactive and disk.dev.aveq on RH Advanced Server
> 
> 
> With no disk activity on the system I see disk.dev.avactive and
> disk.dev.aveq on the root drive. The last sample had some disk io but the
> disk.dev.avactive number did not change and disk.dev.aveq number did not
> change significantly. Are these metrics supposed to me accurate? They look
> bogus to me. 
> 
> I am running RetHat Advanced Sever with a 2.4.18 kernel with the sard patch
> applied.
> 
> The script:
> 
> pmie -f -e -V <<pmie.end
> //
> // Watch average disk utilization and average queue length
> //
> myhost = "localhost";                 // the host of interest
> delta = 3 sec;
> Block_total =
>     disk.dev.blktotal :\$myhost;
> Average_disk_utilization =
>     disk.dev.avactive :\$myhost;
> Average_queue_length =
>     disk.dev.aveq :\$myhost;
> 
> pmie.end
> 
> The output:
> 
> Block_total (Tue Mar 18 11:09:06 2003): ? ?
> Average_disk_utilization (Tue Mar 18 11:09:06 2003): ? ?
> Average_queue_length (Tue Mar 18 11:09:06 2003): ? ?
> 
> Block_total (Tue Mar 18 11:09:09 2003):
>     localhost: [sda] 0
>     localhost: [sdb] 0
> Average_disk_utilization (Tue Mar 18 11:09:09 2003):
>     localhost: [sda] 10.0
>     localhost: [sdb] 0
> Average_queue_length (Tue Mar 18 11:09:09 2003):
>     localhost: [sda] 30.1
>     localhost: [sdb] 0
> 
> Block_total (Tue Mar 18 11:09:12 2003):
>     localhost: [sda] 0
>     localhost: [sdb] 0
> Average_disk_utilization (Tue Mar 18 11:09:12 2003):
>     localhost: [sda] 10.0
>     localhost: [sdb] 0
> Average_queue_length (Tue Mar 18 11:09:12 2003):
>     localhost: [sda] 30.0
>     localhost: [sdb] 0
> 
> Block_total (Tue Mar 18 11:09:15 2003):
>     localhost: [sda] 0
>     localhost: [sdb] 0
> Average_disk_utilization (Tue Mar 18 11:09:15 2003):
>     localhost: [sda] 10.0
>     localhost: [sdb] 0
> Average_queue_length (Tue Mar 18 11:09:15 2003):
>     localhost: [sda] 30.0
>     localhost: [sdb] 0
> 
> Block_total (Tue Mar 18 11:09:18 2003):
>     localhost: [sda] 0
>     localhost: [sdb] 0
> Average_disk_utilization (Tue Mar 18 11:09:18 2003):
>     localhost: [sda] 10.0
>     localhost: [sdb] 0
> Average_queue_length (Tue Mar 18 11:09:18 2003):
>     localhost: [sda] 30.0
>     localhost: [sdb] 0
> 
> Block_total (Tue Mar 18 11:09:21 2003):
>     localhost: [sda] 85
>     localhost: [sdb] 0
> Average_disk_utilization (Tue Mar 18 11:09:21 2003):
>     localhost: [sda] 10.0
>     localhost: [sdb] 0
> Average_queue_length (Tue Mar 18 11:09:21 2003):
>     localhost: [sda] 30.9
>     localhost: [sdb] 0
> 
> Todd C. Davis
> These are my opinions and absolutely not official opinions of Intel Corp.
> Telco Systems Development
> Intel Corporation, Columbia Design Center
> CBA-2, Suite 100
> 250 Berry Hill Road
> Columbia, SC 29210
> (803) 461-6108
> fax:: (803) 461-6292
> mailto:todd.c.davis@xxxxxxxxx
>  
> 
> 
> 



<Prev in Thread] Current Thread [Next in Thread>