Comment # 7
on bug 1100
from Ken McDonell
(In reply to comment #6)
> The simplest response of exiting pmlogger upon
> a PMCD_ADD_AGENT could be fine.
We'd rather not do this. In the most common case PMCD_ADD_AGENT happens
because a new (to pmlogger) PMDA has been added ... there is no reason to
terminate pmlogger if this happens.
Rather, when PMCD_ADD_AGENT happens, pmlogger needs to mark all the metadata as
"to be verified" and recheck any "to be verified" metadata for the metrics in
each fetch ... if the verification fails, then pmlogger would report the fact
and exit.
Even if the PMDA is known to pmlogger, the most common scenario (outside the
PMDA developer community) would be that the new PMDA has exactly the same
metadata as the old PMDA and pmlogger can continue without issue.
> Only thing I'd add is the imporance of libpcp to better detect and
> tolerate or reject malformed archives (however they were created),
> instead of killing the application with an assert or segv.
The importance is well and truly understood. But the PCP archives use complex
data structures and while decoding the records can, and does, include a lot of
checking, there are cases (like this one) where I don't know how the code
could detect that the data is bad, although we should be able to detect the
consequences of this later and not die. I will dig deeper to see why things
went so badly in this case, and there are several earlier "warnings" which
perhaps should be fatal errors (especially as I've never seen them at all with
"good" archives).