[pcp] pmproxy intermittent failure

Nathan Scott nscott at aconex.com
Sun Mar 29 19:01:49 CDT 2009


Hi all,

We're observing an occassional problem when running pmproxy.  I've not
been able to see the source from quick code auditing, so hopefully some
other kind soul may have some ideas.

Every now and again, pmproxy stops listening on its port.  This can be
seen with "netstat -tulnp" - the process is still running, but it isn't
accepting new connections.  The log file tends to have one or two client
connection attempts listed, partially-setup but failed, and then it just
stops talking to the world.

The pmproxy log entries look something like:


Log for pmproxy on [HOST] started Wed Feb 18 07:31:29 2009

pmproxy: PID = 31126, PDU version = 2
pmproxy request port(s):
  sts fd  IP addr
  === === ========
  ok    0 [IPADDR]
AcceptNewClient: bad pmcd port "mproxy-client 1" recv from client at [IPADDR]
AcceptNewClient: bad version string () recv from client at [IPADDR]
AcceptNewClient: bad pmcd port "mproxy-client 1" recv from client at [IPADDR]


This is not a new issue, we've observed it on and off for a long,
long time (I observed the failure this morning in a machine running
PCP code from '07 - the above log is from a more recent PCP version
though).  Here's another log (this is the old PCP code):


Log for pmproxy on [HOST] started Mon Mar 30 08:27:53 2009

pmproxy: PID = 6894, PDU version = 2
pmproxy request port(s):
  sts fd  IP addr
  === === ========
  ok    0 [IPADDR]
AcceptNewClient: failed to get PMCD hostname () from client at [IPADDR]


So, it seems to be at some random point during the initial setup,
during the client/server handshaking - sometimes right away, other
times after getting past the initial version string.  Most of the
time, it works just fine though ...seems almost like we're getting
data from multiple clients on one descriptor???

Those should just cause client disconnection too, but we see the
server stop listening eventually too.  Only cause I can think of
there is that perhaps the server has accidentally closed the file
descriptor it was listening on?

That second log is the second failure today on that host (its getting
alot more requests today than it normally would, so failure seems to
be a bit related to pmproxy load at least - and thats load in terms
of both number of clients and amount of packets passing thru).  Hmm,
could be time to instrument pmproxy along the lines of pmcd?

cheers.

-- 
Nathan



More information about the pcp mailing list