Lukas and Frank,
On 2/26/2015 3:46 PM, Frank Ch. Eigler wrote:
Please see commit 7347927a67849a74b67d8b25fb58c033ee79042d on
git://sourceware.org/git/pcpfans.git lberk/dev
Thanks, nice work! Included some Buffalo folks on cc:, because the
idea for this PR [1] came from their site needs at CCR, to have
job-specific pmlogger data. It would be nice to know whether, with
this facility, a "pmlc one-shot METRIC" widget would be helpful or
redundant.
Thanks for this! This will be useful to track pids on nodes running
single jobs. I think it would still be useful to have a "pmlc one-shot
set-of-metrics/configfile". That interleaves the results into a
currently running primary logger. For now we are planning on setting up
separate "once" loggers that we can fire off as needed and then merge
the files afterwards. This may be good enough, but we haven't
implemented it yet to be sure there are no issues.
The use case is to, as exactly as possible, annotate times that
may be "interesting" in some way. For instance during a single job, we
may want to indicate the time and collect stats on the boundaries of
preprocessing/compute/post processing that may be part of the same job,
but we want to have a record at the exact moment these occur regardless
of default sampling interval. We are able to run an arbitrary shell
script at these times.
We can't just increase the default logger interval, since we have a
mix of jobs that run for less than a minute and others that run for days
or weeks on the same nodes, and logging at a high enough frequency for
the short jobs would generate too much data overall. We already log at
30 sec and miss some information. With these one-shot type events, we
could probably decrease our default logging interval. If we didn't have
shared resources, this would be much easier, but we could have 10-20
jobs per node, and running a separate resolution logger for each would
create too much data.
I think this solution is very useful, but we would also use the one-shot
facility if it existed.
Thanks.
Martins
|