On Wed, 19 Mar 2003, Davis, Todd C wrote:
> I think I found part of the bug for these metrics. The metrics are being
> converted to milliseconds twice, in the sard patch and in the Linux pmda:
Well spotted.
The reason we had not found this earlier is that most of our Linux
testing has been done on machines where HZ is 1024, so * 1000 / 1024
introduces only a very small error.
Of course if HZ is 100, this error makes the numbers too big by a factor
of 10.
> >From sard patch:
> +#define MSEC(x) ((x) * 1000 / HZ)
> + MSEC(hd->rd_ticks),
> + hd->wr_ios, hd->wr_merges,
> + hd->wr_sectors,
> + MSEC(hd->wr_ticks),
> + hd->ios_in_flight,
> + MSEC(hd->io_ticks),
> + MSEC(hd->aveq));
> +#undef MSEC
>
> >From the Linux pmda:
> case 46: /* disk.dev.avactive */
> atom->ul = 1000 * p->io_ticks / proc_stat.hz;
> break;
> case 47: /* disk.dev.aveq */
> atom->ul = 1000 * p->aveq / proc_stat.hz;
> break;
>
>
> case 44: /* disk.all.avactive */
> atom->ull += 1000 * p->io_ticks / proc_stat.hz;
> break;
> case 45: /* disk.all.aveq */
> atom->ull += 1000 * p->aveq / proc_stat.hz;
> break;
>
> Also shouldn't all the metrics be ull?
Not necessarily. The raw data from /proc/partitions is a 32-bit number,
hence the _per disk_ metrics are ul. Once we start adding these together
to produce the _all_ metrics, we need ull to avoid overflows (within
SGI we often have multiple hundreds of disks on one Linux system,
so this is a real issue for us).
|