Hi,
I have a need to get netstat type output from pcp. We have a
certain class of "runaway" processes that we'd like to monitor. The
scenario is as follows. One of the machines get sluggish, we login and do:
[vagrant@centos7 ]$ sudo netstat -auntpe
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
User Inode PID/Program name
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN
0 17509 1606/master
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN
0 16390 958/rpcbind
tcp 0 0 0.0.0.0:48368 0.0.0.0:* LISTEN
29 16969 1205/rpc.statd
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
0 16790 1068/sshd
.............
and find that the Recv-Q and Send-Q fields are high for some process, we
restart the process and all is well again. I'd love to be able to setup
pmie rules to monitor this type of thing and take action.
Its a larger problem that needs to be fixed, but for now I just need a
good way to deal with this broken software. Right now I have rules for
high cpu load based on process name but this is not always a good
indicator since the process sometimes can use cpu and be perfectly
fine. The only accurate measure of a problem seems to be those Recv-Q
and Send-Q items.
The quickest thing to do would be to write a pmdanetstat, but I'm not
sure if this should live in the proc pmda instead since all the
information comes from the union of /proc/net/tcp with /proc/<pid>/fd.
Then you could just look for a high value for some instance of:
proc.fd.socket.recvq
and all is well. But this is a case of needing a multidimensional
instance. In this case either "pid,fd" or "pid,socketnum". Since a
process can have any number of sockets open. Any suggestions on a way
to organize this better if the thought is that it should be part of the
proc pmda? Maybe its not worth it for the relatively small number of
processes that would have these metrics? If there was an elegant way to
do this, there might be other metrics that could be added to the
"pid,fd" indom.
Otherwise, I think it would be very straightforward to do as a new
pmdanetstat where the instances are the inodes, since those should be
unique. And I may just do that locally for now to get something up and
running quickly.
Thanks
Martins
|