On Tue, Oct 6, 2015 at 9:38 AM, Nathan Scott <nathans@xxxxxxxxxx> wrote:
> Hi Martins,
>
> [CC'ing Jamie who was been hacking on a pcp netstat/nicstat recently]
>
> ----- Original Message -----
>> Hi,
>>
>> I have a need to get netstat type output from pcp. We have a
>> certain class of "runaway" processes that we'd like to monitor. The
>> scenario is as follows. One of the machines get sluggish, we login and do:
>>
>> [vagrant@centos7 ]$ sudo netstat -auntpe
>> Active Internet connections (servers and established)
>> Proto Recv-Q Send-Q Local Address Foreign Address State
>> User Inode PID/Program name
>> tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN
>> 0 17509 1606/master
>> tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN
>> 0 16390 958/rpcbind
>> tcp 0 0 0.0.0.0:48368 0.0.0.0:* LISTEN
>> 29 16969 1205/rpc.statd
>> tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
>> 0 16790 1068/sshd
>> .............
>>
>>
>> and find that the Recv-Q and Send-Q fields are high for some process, we
>> restart the process and all is well again. I'd love to be able to setup
>> pmie rules to monitor this type of thing and take action.
>>
>> Its a larger problem that needs to be fixed, but for now I just need a
>> good way to deal with this broken software. Right now I have rules for
>> high cpu load based on process name but this is not always a good
>> indicator since the process sometimes can use cpu and be perfectly
>> fine. The only accurate measure of a problem seems to be those Recv-Q
>> and Send-Q items.
>>
>> The quickest thing to do would be to write a pmdanetstat, but I'm not
>> sure if this should live in the proc pmda instead since all the
>> information comes from the union of /proc/net/tcp with /proc/<pid>/fd.
>> Then you could just look for a high value for some instance of:
>>
>> proc.fd.socket.recvq
>>
>> and all is well. But this is a case of needing a multidimensional
>> instance. In this case either "pid,fd" or "pid,socketnum". Since a
>> process can have any number of sockets open. Any suggestions on a way
>> to organize this better if the thought is that it should be part of the
>> proc pmda? Maybe its not worth it for the relatively small number of
>> processes that would have these metrics? If there was an elegant way to
>> do this, there might be other metrics that could be added to the
>> "pid,fd" indom.
>>
>
> I think you're on a good path there - in pmdaproc we have already some
> compound instances, like cgroup.blkio.dev which uses a composite of the
> "cgroup::device" as external identifier (and with internal identifiers
> managed by pmdaCache routines). So I'd recommend keeping it in pmdaproc
> where the /proc/[pid]/ iteration code lives.
My current intention with my netstat task is only to implement
"netstat -s" functionality, as netstat is deprecated and there's no
replacement besides parsing /proc/net/$PROTO manually.
I hadn't looked into (or considered) per-socket stats, though these
would be a good idea to implement eventually.
As you likely know, you could write a PMIE rule for increments of
network.tcp.listenoverflows, though this isn't ideal as it's not
per-process, and it's also reacting after the fact rather than
restarting the process before a watermark of the socket backlog is
reached.
Cheers,
Jamie
|