Ken,
On 5/7/15 10:27 PM, Ken McDonell wrote:
But unfortunately in this case the PMDA implementor says the metrics
are seconds, but they are not (see the example below) ... this is a
BUG .. the PMDA should return hotproc.psinfo.start_time either /100
and call it seconds or *10 and call it msec (and increase it to
64-bits in this case).
# pminfo -f proc.psinfo.start_time | sed -e 's/.* value //' | sort -nr
| head -2 ; pminfo -f kernel.all.uptime
9086825
9086825
kernel.all.uptime
value 90868
So I started looking into this. Looks like this has been there forever
(hotproc just uses the same semantics defined in proc). But I have added
other proc metrics that may need fixing as well.
While the change for this metric (to ms and 64 bit to match the
rest of the "time" metrics in proc) seems correct and straightforward, I
have some confusion on the types in general for some of the proc time
metrics.
Apologies for the long description, and maybe I have a mistake somewhere.
If we were to model this change after a similar metric (on Centos 6 x86_64):
proc.psinfo.utime PMID: 3.8.13 = 12591117 = 0xc0200d
Data Type: 64-bit unsigned int InDom: 3.9 0xc00009
Semantics: counter Units: millisec
Which is defined as (in pmda.c):
/* proc.psinfo.utime */
{ NULL,
{ PMDA_PMID(CLUSTER_PID_STAT,13), KERNEL_ULONG, PROC_INDOM,
PM_SEM_COUNTER,
PMDA_PMUNITS(0,1,0,0,PM_TIME_MSEC,0) } },
where KERNEL_ULONG comes from ../linux/convert.h:
#if defined(HAVE_64BIT_LONG)
#define KERNEL_ULONG PM_TYPE_U64
#define _pm_assign_ulong(atomp, val) do { (atomp)->ull = (val); } while (0)
#else
#define KERNEL_ULONG PM_TYPE_U32
#define _pm_assign_ulong(atomp, val) do { (atomp)->ul = (val); } while (0)
#endif
Where "man 5 proc" says about the source type:
utime %lu
Which on this platform should be 64 bit.
But the assignment from the source proc data does this explicit 32bit
cast (pmda.c):
case PROC_PID_STAT_UTIME: /* proc.psinfo.utime */
case PROC_PID_STAT_STIME: /* proc.psinfo.stime */
case PROC_PID_STAT_CUTIME: /* proc.psinfo.cutime */
case PROC_PID_STAT_CSTIME: /* proc.psinfo.cstime */
/* unsigned jiffies converted to unsigned msecs */
f = _pm_getfield(entry->stat_buf, idp->item);
if (f == NULL)
return 0;
ul = (__uint32_t)strtoul(f, &tail, 0);
_pm_assign_ulong(atom, 1000 * (double)ul / hz);
break;
Since i think the source value is centi-seconds (on this platform
anyway), that cast will cause truncation in less than a cpuyear if my
math is correct.
So even though: proc.psinfo.utime is defined as 64 bits, "ul" in the
pmda is defined as "unsigned long" (64 bits) and the _pm_assign_ulong
macro assigns to ull, it will never report a value larger than
max(__uint32_t)*1000/hz, due to the __uint32_t cast.
It appears the solution is to do a conditional cast based on
HAVE_64BIT_LONG. So augment the above to something like:
#if defined(HAVE_64BIT_LONG)
#define KERNEL_ULONG PM_TYPE_U64
#define _pm_assign_ulong(atomp, val) do { (atomp)->ull = (val); } while (0)
#define __pm_kernel_ulong_t __uint64_t
#else
#define KERNEL_ULONG PM_TYPE_U32
#define _pm_assign_ulong(atomp, val) do { (atomp)->ul = (val); } while (0)
#define __pm_kernel_ulong_t __uint32_t
#endif
and then change the cast to :
ul = (__pm_kernel_ulong_t)strtoul(f, &tail, 0);
There are a couple other spots where this could be done in the pmda as well.
Does that sound reasonable, or some other cleanup with respect to data
width?
Martins
|