A PM_CONTEXT_UNIFIED host-oriented context type
===============================================
Goal: Transparent transition between live and archive mode requests,
or between one archive and another for the same host. The model
of pushing the responsibility of dealing with these issues onto
the user is a significant barrier to entry and makes performance
problem solving a more difficult process with PCP than it could
be. This keeps coming up as something we need to do better.
The user of any PCP client tool should be able to simply request data
for <host> (name), and be able to utilize both live and archived data
seamlessly, without having to deal with the following issues:
- Knowledge of standard paths/locations for PCP archives. A set of
log locations should be scanned that would suit for the requested
host - defaulting to $PCP_LOG_DIR/<host>/ (pmlogger_daily - i.e.
system-wide loggers) and $HOME/.pcp/pmlogger/<host>/ (pmchart -
i.e. user-specific loggers).
- Knowledge of the timestamping scheme being used by the tools that
record data to these locations.
- Ability to deal with logs that are actively growing (IOW pmlogger
is actively writing to the end of the log files).
- Capturing end of archive (PM_ERR_EOL) and transitioning to a
separate, live context without client tool knowledge when data
is being sampled "forward" temporally.
- Conversely, capturing start of archive (either "backward" replay
or a time window is requested which spans the archive start) in a
pmFetch and automatically transitioning to an earlier archive.
- Dealing with archives that "overlap", and ensuring an accurate
representation of the values they contain is seamless presented
via all client tools.
Rationale:
New users expect this; it is unduly difficult to make this work today
(requires pmlogmerge/pmlogextract to do multi-archives, creating new
archives, which can be problematic too - large archive files, ENOSPC,
potentially large amounts of data needs to be scanned, lots of write
activity - opening/closing multiple archives automatically could be
alot cleaner and quicker).
Experienced users want this; when exploring an actively-happening perf
problem in a production environment its useful to say "pmstat -S -10m"
-> "run pmstat, from 10 minutes ago, up to now and then keep sampling"
but this is not possible currently, and alot of time is wasted seeking
out the right archive from todays set.
Many tools are much simplified with this concept - in particular, the
pmchart and pmtime default user interfaces become simpler, and remove
the need to have a different personality for the two modes (although
the ability to run in explicit LIVE or ARCHIVE context mode would be
retained for back-compat, it doesn't need to be put in front of new
users -> use of -a/-h could trigger those behaviours).
Issues:
Backwards compatibility - semantics of both existing PM_CONTEXT_HOST
and PM_CONTEXT_ARCHIVE must be preserved, so this mode will require a
new context type (PM_CONTEXT_UNIFIED). Over time, we should plan to
move towards defaulting to this mode.
pmNewContext, pmSetMode are the most obviously affected APIs, and the
__pmContext structure will need to acquire state tracking capabilities
for multiple sub-contexts within a single unified context.
This doesn't explicitly tackle another related issue, namely that once
data has been fetched live (esp. by pmchart) people would like to be
able to automatically scan back to the data they recently had. We can
possibly tackle that via pmimport APIs though? That could even be done
automatically, as part of the semantics of using a unified context - a
new per-user archive could be created like the explicit record mode in
pmchart does now ($HOME/.pcp/pmlogger/) in libpcp? Not sure on that one
but it would be good to solve that problem too.
No doubt, many other issues are lurking. :) Solutions will need to be
found, so the sooner we know about 'em all the better! This is quite a
difficult problem, and a big interface extension - so would probably
trigger a major version bump (PCP 4.0) when it lands I guess.
Ideas, Alternatives:
- mgoodwin has suggested maybe this could be all done through use of
the existing PM_CONTEXT_HOST and allowing options like -S, -T, -A
and so on to trigger automatic PM_CONTEXT_ARCHIVE creation within
libpcp as needed. As above, I tend to disagree (re back-compat),
but if it could be done that would be a more seamless model. My
current thinking is we would introduce a new context type, and an
associated new command line option to many tools (-u <host>) that
would enable this mode (and over time, become default, while also
keeping the existing modes). This gives some opportunity to move
away from -h <host>, which many folks have expressed a disapproval
of (the option "-h" that is), and also gives opportunity for us to
reconsider the host specification syntax (which is also not loved
universally).
- fche has also suggested that PM_CONTEXT_HOST be retained as-is and
we instead build this new functionality into pmcd, such that pmcd
does the archive mode fetches on behalf of clients and serves up
both live and historical data. My initial thoughts there are that
approach may introduce many new problems - pmcd having to do disk
I/O (and alot of it) is not giving me a warm and fuzzy at all, nor
is the suggestion to add threading to it to help with that. We do
however have pmproxy up our sleeves - it is a client too, so could
be extended to serve both live/archive data (for remote historical
data), and I'd be sweating alot less about making pmproxy threaded
rather than pmcd. pmproxy would then be more akin to pmwebd which
does both live/archive JSON requests already.
- ... plan D?
--
Nathan
|