Hi Martins,
Yeah, this bit definitely looks like a bug in the attributes PDU -
we'll be sending this from pmcd to (some) PMDAs on establishment of
a new af_unix: client connection for the uid/gid attribute transfer.
----- Original Message -----
> [...]
> Finally, the exact same thing happens if I kill the sample or linux
> pmdas, but not other pmdas. No problems killing: simple, xfs, ib, all
> perl pmdas. Don't understand this part.
This is possibly because some agents register an interest in attributes
with pmcd (linux, proc, sample), but others don't (simple, xfs, ib, etc).
That's a guess, but those that hang vs those that done seem to be aligned
on that boundary anyway.
> OK, hmm, the backtrace on pmcd when doing "pmval hinv.ncpu" after the
> proc_pmda has died:
>
>
> Program received signal SIGPIPE, Broken pipe.
> 0x00007fc064b4c520 in __write_nocancel () from /lib64/libc.so.6
> (gdb) bt
> #0 0x00007fc064b4c520 in __write_nocancel () from /lib64/libc.so.6
> #1 0x00007fc065023c03 in __pmXmitPDU (fd=11, pdubuf=0x7fc06686b000) at
> pdu.c:338
> [...]
> Since pmcd doesn't know that the proc_pmda has gone, AgentsAttributes
> tries to send it a message and them boom.
I'll dig into it some more, thanks. The mystery is why the write does not
return an error straight away (and receive sigpipe), since the reading end
of the pipe/socket is closed. Very odd - seems like the attributes PDU is
being handled differently to the other PDUs, but whatever it is I can't see
it so far ... will keep looking, thanks Martins!
cheers.
--
Nathan
|