pcp
[Top] [All Lists]

Re: [pcp] PCP Network Latency PMDA

To: pcp@xxxxxxxxxxx
Subject: Re: [pcp] PCP Network Latency PMDA
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Fri, 20 Jun 2014 07:15:43 +1000
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <53A34A47.3060008@xxxxxxxxxx>
References: <53A34A47.3060008@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0
Will, welcome to the list ...

On 20/06/14 06:38, William Cohen wrote:

...The thought is that it would probably
be sufficient to provide metrics for latency for packet send and
receive of each network device.

Sounds like a good idea and a natural extension to the coverage of the existing network interface metrics.

... I have some questions on implementing
the performance metric names.  Thinking maybe somethingn like the
following names:

network.interface.in.latency instance "devname"
network.interface.out.latency instance "devname"

The value would be the average latency on the device.  This would be
similar to the kernel.all.cpu.* metrics in the respect that the
latencies would be average over some window of time.  Would it be
better to provide raw monotonic sums of the delays and number of
network packets in the pmda and have pcp derive the average latency
from the raw metric values? ...

As a general rule we're very strongly in favour of PMDAs exporting running counters when they are available. This reduces state and complexity in the PMDA, avoids hard-wired decisions about the "average time period" in the PMDA and dodges the whole (statistically) messy issue of rolling time averages and delays in average calculation and reporting for the PMDA ... all of these issues dilute the semantic value of the exported data and are better handled by the clients.

For simple counters all PCP clients are able to deal with their own averaging to match their needs, with consistency for multiple clients that may be sampling the same metrics at different times and rates.

Now the data you're talking about probably requires the more complicated
delta(v1) / delta(v2) calculation for a user-facing client (but not pmlogger), where v1 is the sum of latencies and v2 is the packet count, but this can be automagically produced by "derived metrics" that a client can choose to use (see an example in the second para of the pmRegisterDerived(3) man page), or done by pmie(1).

... Any thoughts or comments about this proposed network latency PMDA?

For the other network interface metrics, we only incur an overhead when the data is requested (when the metrics are not being requested there is zero additional cost because the instrumentation is already being done in the kernel). This might not be the case here, where I expect the instrumentation would need to be enabled, even if the metrics are not being requested.

Depending on the overhead of this instrumentation, you may wish to consider using an additional control variable and pmStore(1) to allow a user to enable or disable the collection, with the metrics returning PM_ERR_VALUE or PM_ERR_APPVERSION if the metrics are requested but the collection is disabled. For examples of PMDA behaviour being modified by pmStore metrics, see (as in pminfo -T) xfs.control.reset or sample.dodgey.control).

Cheers, Ken.

<Prev in Thread] Current Thread [Next in Thread>