pcp
[Top] [All Lists]

Re: [pcp] netstat output from proc pmda

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: [pcp] netstat output from proc pmda
From: Jamie Bainbridge <jbainbri@xxxxxxxxxx>
Date: Tue, 6 Oct 2015 14:24:58 +1000
Cc: Martins Innus <minnus@xxxxxxxxxxx>, pcp developers <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <1100426246.49282610.1444088315890.JavaMail.zimbra@xxxxxxxxxx>
References: <560EB3BD.50100@xxxxxxxxxxx> <1100426246.49282610.1444088315890.JavaMail.zimbra@xxxxxxxxxx>
On Tue, Oct 6, 2015 at 9:38 AM, Nathan Scott <nathans@xxxxxxxxxx> wrote:
> Hi Martins,
>
> [CC'ing Jamie who was been hacking on a pcp netstat/nicstat recently]
>
> ----- Original Message -----
>> Hi,
>>
>>      I have a need to get netstat type output from pcp. We have a
>> certain class of "runaway" processes that we'd like to monitor. The
>> scenario is as follows.  One of the machines get sluggish, we login and do:
>>
>> [vagrant@centos7 ]$ sudo netstat -auntpe
>> Active Internet connections (servers and established)
>> Proto Recv-Q Send-Q Local Address           Foreign Address State
>> User       Inode      PID/Program name
>> tcp        0      0 127.0.0.1:25            0.0.0.0:* LISTEN
>> 0          17509      1606/master
>> tcp        0      0 0.0.0.0:111             0.0.0.0:* LISTEN
>> 0          16390      958/rpcbind
>> tcp        0      0 0.0.0.0:48368           0.0.0.0:* LISTEN
>> 29         16969      1205/rpc.statd
>> tcp        0      0 0.0.0.0:22              0.0.0.0:* LISTEN
>> 0          16790      1068/sshd
>> .............
>>
>>
>> and find that the Recv-Q and Send-Q fields are high for some process, we
>> restart the process and all is well again.  I'd love to be able to setup
>> pmie rules to monitor this type of thing and take action.
>>
>> Its a larger problem that needs to be fixed, but for now I just need a
>> good way to deal with this broken software.  Right now I have rules for
>> high cpu load based on process name but this is not always a good
>> indicator since the process sometimes can use cpu and be perfectly
>> fine.  The only accurate measure of a problem seems to be those Recv-Q
>> and Send-Q items.
>>
>> The quickest thing to do would be to write a pmdanetstat, but I'm not
>> sure if this should live in the proc pmda instead since all the
>> information comes from the union of /proc/net/tcp with /proc/<pid>/fd.
>> Then you could just look for a high value for some instance of:
>>
>> proc.fd.socket.recvq
>>
>> and all is well.  But this is a case of needing a multidimensional
>> instance.  In this case either "pid,fd" or "pid,socketnum".  Since a
>> process can have any number of sockets open.  Any suggestions on a way
>> to organize this better if the thought is that it should be part of the
>> proc pmda?  Maybe its not worth it for the relatively small number of
>> processes that would have these metrics?  If there was an elegant way to
>> do this, there might be other metrics that could be added to the
>> "pid,fd" indom.
>>
>
> I think you're on a good path there - in pmdaproc we have already some
> compound instances, like cgroup.blkio.dev which uses a composite of the
> "cgroup::device" as external identifier (and with internal identifiers
> managed by pmdaCache routines).  So I'd recommend keeping it in pmdaproc
> where the /proc/[pid]/ iteration code lives.


My current intention with my netstat task is only to implement
"netstat -s" functionality, as netstat is deprecated and there's no
replacement besides parsing /proc/net/$PROTO manually.

I hadn't looked into (or considered) per-socket stats, though these
would be a good idea to implement eventually.

As you likely know, you could write a PMIE rule for increments of
network.tcp.listenoverflows, though this isn't ideal as it's not
per-process, and it's also reacting after the fact rather than
restarting the process before a watermark of the socket backlog is
reached.

Cheers,
Jamie

<Prev in Thread] Current Thread [Next in Thread>