pcp
[Top] [All Lists]

Re: [pcp] [RFC] Collect additional metrics from /proc/net/netstat

To: Michele Baldessari <michele@xxxxxxxxxx>
Subject: Re: [pcp] [RFC] Collect additional metrics from /proc/net/netstat
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Thu, 15 May 2014 02:49:12 -0400 (EDT)
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <1400111497-2534-1-git-send-email-michele@xxxxxxxxxx>
References: <1400111497-2534-1-git-send-email-michele@xxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: uJj40BP4uZMRL5Xy4NJS5r7SojMaSw==
Thread-topic: Collect additional metrics from /proc/net/netstat
Hi Michele,

----- Original Message -----
> Hi all,
> 
> this patch mimics src/pmdas/linux/proc_net_snmp.[ch] and extends the
> network.{tcp,ip} namespaces in order to get a better glimpse at the
> behaviour of the Linux network stack.
> 
> The rationale behind these new metrics is that these days, in order
> to troubleshoot network performance issues you really need to look at
> all of these metrics to understand why a certain workload is not
> performing as expected. I will also submit two other separate PMDAs
> (sctp and ethtool -S) later on.

Very cool, thanks!

> RFC since a) I just started reading up on PCP and b) I am unsure
> if it is okay to add ~120 metrics in the Linux PMDA or if it is
> preferable to put these in a separate one.
> 
> Any feedback is appreciated.

I'll review more closely once this current release is out the door,
but the answer to b) is "its fine".  If/when we do reach a point down
the track where we collectively decide there's too many metrics in a
particular PMDA, there are approaches we've used before to split 'em
up (while preserving the namespace, transitioning PMIDs safely, etc).

A few other things to think about at this stage:

- it'd be a good idea to consider whether all/some/none of these new
metrics should be in the default-logged (by the system pmlogger) set
of metrics (and what sort of sampling interval would suit);

- whether default or not, a src/pmlogconf/tools/netstat file (or some
other similarly named file closeby there) would be a welcome addition
to help people record these metrics;

- in terms of automated testing, the qa/957 test case automatically
will use valgrind to check your new additions to the kernel PMDA, but
we should add 2 new (similar) tests for the other two new PMDAs (I'll
help you with that once they exist if you like);

- we should also check the actual values being exported match what
we expect.  In some cases we've used other tools for verifying (see
qa/635 for a networking example) but I have another, more generic
approach planned for the kernel metrics (incl. these) which I hope
to tackle next release - leave that with me & I'll get back to you
on that one, I may need just some test case data from you there.

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>