Hi, Dave -
> 1. I'm looking at the code in src/pmlogger/src/fetch.c where 'if (changed &
> PMCD_ADD_AGENT)' is handled. It seems to me that this test which adds a
> mark record in the case a pmda (re)starts (outside the loop which handles
> the received pdus) is too late [...]
You could put a putmark(); call over at the sites within the loop
where disconnect() currently sits (lines 180 / 184 / 188). (It
doesn't make sense to me why the code doesn't break from the "while
(n==0)" loop at those points.)
> 2. I'm considering two possibilities for checking the consistency of the
> PMNS+metadata:
> 1. Check the consistency of all metrics in all task list items at this
> point
> [...]
> 2. Check the consistency of metrics are they are fetched later
> [...]
> æ pro: pmlogger may potentially continue indefinitely, since
> inactive
> metrics may never be flagged
> æ con: error may be harder to relate to the actual event, since it
> may be detected much later
Disconnections / reconnections should be logged, so this would be manageable.
Doing the checks incrementally should work OK, with not too much new
infrastructure. For example, you could tag each task_t structure with
a timestamp of the last time its t_pmidlist / t_desclist were looked
up from the t_namelist. Each reconnect would update a global
timestamp. If the global-reconnect-timestamp is newer, you do the
lookup/check in do_work().
- FChE
|