pcp
[Top] [All Lists]

RE: disk.dev.avactive and disk.dev.aveq on RH Advanced Server

To: "Davis, Todd C" <todd.c.davis@xxxxxxxxx>
Subject: RE: disk.dev.avactive and disk.dev.aveq on RH Advanced Server
From: kenmcd@xxxxxxxxxxxxxxxxx
Date: Sun, 13 Apr 2003 09:11:18 +1000 (EST)
Cc: "'pcp@xxxxxxxxxxx'" <pcp@xxxxxxxxxxx>
In-reply-to: <29AD895CE780D511A8870002A50A666D04F90988@xxxxxxxxxxxxxxxxxxxxxx>
Reply-to: kenmcd@xxxxxxxxxxxxxxxxx
Sender: pcp-bounce@xxxxxxxxxxx
On Wed, 19 Mar 2003, Davis, Todd C wrote:

> I think I found part of the bug for these metrics. The metrics are being
> converted to milliseconds twice, in the sard patch and in the Linux pmda:

Well spotted.

The reason we had not found this earlier is that most of our Linux
testing has been done on machines where HZ is 1024, so * 1000 / 1024
introduces only a very small error.

Of course if HZ is 100, this error makes the numbers too big by a factor
of 10.

> >From sard patch:
> +#define MSEC(x) ((x) * 1000 / HZ)
> +                                     MSEC(hd->rd_ticks),
> +                                     hd->wr_ios, hd->wr_merges,
> +                                     hd->wr_sectors,
> +                                     MSEC(hd->wr_ticks),
> +                                     hd->ios_in_flight,
> +                                     MSEC(hd->io_ticks),
> +                                     MSEC(hd->aveq));
> +#undef MSEC
> 
> >From the Linux pmda:
>               case 46: /* disk.dev.avactive */
>                   atom->ul = 1000 * p->io_ticks / proc_stat.hz;
>                   break;
>               case 47: /* disk.dev.aveq */
>                   atom->ul = 1000 * p->aveq / proc_stat.hz;
>                   break;
> 
> 
>                   case 44: /* disk.all.avactive */
>                       atom->ull += 1000 * p->io_ticks / proc_stat.hz;
>                       break;
>                   case 45: /* disk.all.aveq */
>                       atom->ull += 1000 * p->aveq / proc_stat.hz;
>                       break;
> 
> Also shouldn't all the metrics be ull?

Not necessarily.  The raw data from /proc/partitions is a 32-bit number,
hence the _per disk_ metrics are ul.  Once we start adding these together
to produce the _all_ metrics, we need ull to avoid overflows (within
SGI we often have multiple hundreds of disks on one Linux system,
so this is a real issue for us).



<Prev in Thread] Current Thread [Next in Thread>