Hi Martins,
[CC'ing Jamie who was been hacking on a pcp netstat/nicstat recently]
----- Original Message -----
> Hi,
>
> I have a need to get netstat type output from pcp. We have a
> certain class of "runaway" processes that we'd like to monitor. The
> scenario is as follows. One of the machines get sluggish, we login and do:
>
> [vagrant@centos7 ]$ sudo netstat -auntpe
> Active Internet connections (servers and established)
> Proto Recv-Q Send-Q Local Address Foreign Address State
> User Inode PID/Program name
> tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN
> 0 17509 1606/master
> tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN
> 0 16390 958/rpcbind
> tcp 0 0 0.0.0.0:48368 0.0.0.0:* LISTEN
> 29 16969 1205/rpc.statd
> tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
> 0 16790 1068/sshd
> .............
>
>
> and find that the Recv-Q and Send-Q fields are high for some process, we
> restart the process and all is well again. I'd love to be able to setup
> pmie rules to monitor this type of thing and take action.
>
> Its a larger problem that needs to be fixed, but for now I just need a
> good way to deal with this broken software. Right now I have rules for
> high cpu load based on process name but this is not always a good
> indicator since the process sometimes can use cpu and be perfectly
> fine. The only accurate measure of a problem seems to be those Recv-Q
> and Send-Q items.
>
> The quickest thing to do would be to write a pmdanetstat, but I'm not
> sure if this should live in the proc pmda instead since all the
> information comes from the union of /proc/net/tcp with /proc/<pid>/fd.
> Then you could just look for a high value for some instance of:
>
> proc.fd.socket.recvq
>
> and all is well. But this is a case of needing a multidimensional
> instance. In this case either "pid,fd" or "pid,socketnum". Since a
> process can have any number of sockets open. Any suggestions on a way
> to organize this better if the thought is that it should be part of the
> proc pmda? Maybe its not worth it for the relatively small number of
> processes that would have these metrics? If there was an elegant way to
> do this, there might be other metrics that could be added to the
> "pid,fd" indom.
>
I think you're on a good path there - in pmdaproc we have already some
compound instances, like cgroup.blkio.dev which uses a composite of the
"cgroup::device" as external identifier (and with internal identifiers
managed by pmdaCache routines). So I'd recommend keeping it in pmdaproc
where the /proc/[pid]/ iteration code lives.
cheers.
--
Nathan
|