pcp
[Top] [All Lists]

Re: pcp updates: dynamic metrics, pmdapapi, containers, qa

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: pcp updates: dynamic metrics, pmdapapi, containers, qa
From: fche@xxxxxxxxxx (Frank Ch. Eigler)
Date: Tue, 13 Jan 2015 21:27:33 -0500
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <237009758.17657825.1418625267689.JavaMail.zimbra@xxxxxxxxxx> (Nathan Scott's message of "Mon, 15 Dec 2014 01:34:27 -0500 (EST)")
References: <494364595.14790394.1418280789809.JavaMail.zimbra@xxxxxxxxxx> <1300433870.14907994.1418289576168.JavaMail.zimbra@xxxxxxxxxx> <y0mlhmejmcc.fsf@xxxxxxxx> <237009758.17657825.1418625267689.JavaMail.zimbra@xxxxxxxxxx>
User-agent: Gnus/5.1008 (Gnus v5.10.8) Emacs/21.4 (gnu/linux)
>> > [...]
>> > Nathan Scott (1):
>> >       libpcp/pmcd: protocol-level support for containers
>> > [...]
>> 
>> Can you explain further your intended design for this, so [...]

> We're discussing here the implementation of:
> http://www.pcp.io/pipermail/pcp/2014-June/005030.html
> ... for which the above was the first, preparatory commit of several
> pending commits. [...]

That message outlined the general need & rough workings for the
root-privileged namespace-fd-passing widget.  We have gone quite a bit
past that point since.


> [...] Even post-merge/release, "cast in stone" is not really a
> concrete thing [...]

Sorry about the wording.  It must have came from observing other work
that was committed without branching, prior reviews, followed
by neglect to address post-facto comments; let's assume that this was
an aberration and move on.


> [...] and I expect the rest to arrive this week.  It's not going to
> be conducive to design-by-committee at this stage, I'm afraid (if
> ever it was - plenty of experimental, exploratory work there).  But
> there will be several weeks between merge/release for wider review,
> testing and sharing of thoughts/ideas/improvements. [...]
> The answers to your other more specific questions will become clear as
> the code arrives.  [...]

Well, let's get down to it.  There doesn't appear to be a design
document or RFC or whatever to help address those first questions, so
let's proceed via reverse-engineering the code.  (Please excuse any
misreadings, if any, evident in the following.)  A lot of good tricky
stuff has been done, leaving only a few issues for now:


Container naming.  The reason I brought this aspect up a few weeks ago
was because naming is a serious and long-lasting type of policy that
we should get right in a forward-proof way.  For example, this means
we should avoid hard-coding policies of current-day fashions
(specifically: particular heuristics for current versions of
particular container drivers).  I see a couple of concerns here.

The "container name" is not a defined term.  A PCP user cannot rely on
any particular fixed interpretation.  This is made worse by absence of
namespacing (no "docker:" prefix for example), so the same name could
matched by more than one container_driver_t.  Ah, but it's cached?
That's not good enough - the cache could/should become invalidated if
the heuristic fuzzy-name-matching lookup becomes ambiguous by the
later changes to the container indom.  Ah, maybe the indom pmdacache
will be refreshed coincidentally?  That's not great either - then an
ambiguous name may be mapped to different containers/drivers during a
sequence of operations, during the life of a single pmcontext.

The gist of the above is that the naming/lookup/caching facility,
while appearing quite user-friendly for quick one-shot jobs, is IMHO
counterproductive for robust future-proof usage.  IMO it would be
better if:
- the container-name to namespace-pid/fd mapping were done once, at
  client connection time
- non-matching mappings to be rejected
- ambiguous mappings to be rejected
- possibly, support an (optional?) prefix identifying the type of
  container (to defeat ambiguities)
- container-names would not need to be stored in pmda contexts

The docker implementation's search for containers includes "exited"
containers too, not just those that are running.  Is that done in
anticipation of someone restarting the container in the future?  For
dead containers, there is no pid/fd for pmdaroot to attach to, so the
mapping couldn't fully succeed.  There's some attractiveness to
letting users specify -future- container names at pcp connection time,
but then we lose the ability to do robust rejection of erroneous names.
Dunno.

Have you considered eschewing docker specialization entirely in the
code, and adopting a policy whereby a "container-name" is just a
subdirectory under /sys/fs/cgroup?  (To enter the namespace of a
process trapped in such a cgroup, one needs only to open the
"cpu/tasks" file, pick a live PID, hop over to its "/proc/PID/ns".)
Then, this code would automagically work for the sort of
mini-containers produced by systemd for services, and probably
libvirt.  (The code could contain heuristics for making the
cgroup-naming conventions more friendly, like the HASH ->
/system.slice/docker-HASH.scope/ needed for the docker-under-systemd
case.)


Second issue.  As you know, linux containers are a mishmash of the
cgroup and namespace technologies.  The new containers machinery
appears to deal exclusively with the namespace half (so
proc/network/etc. pmdas can switch into the namespaces to query
relevant introspection info).

But there appears to be no connection to cgroups stats!  That's a
shame: the bulk of performance information about a container can be
found just there.  When the user specifies "pmCLIENT --container=foo",
it would be nice if the corresponding cgroup.* indom-subset (if any)
were made available as a convenient alias ...  or some other
cross-referencing machinery were made available.  (Note that the
cgroups indom is based on a clear semantics of cgroupfs
subdirectories, so there is some precedent to the proposal above.)


Third issue, not really serious, but I wonder how come the third-party
jsmn library ended up being quietly bundled in our sources.  Have you
considered using a platform library like json-parser?


Fourth issue, I don't know if it's a bug, but "pminfo --container=JUNK
..."  happily prints information about the host, without diagnostics.
(For that matter, I was not able to get any legal-looking container
hash working either.)  The absence of documentation (pcpintro?
__pmparsehostattrsspec?) is acute.



Anyway, lots of good work done, thanks for the big leap toward
container-introspection support!


- FChE

<Prev in Thread] Current Thread [Next in Thread>
  • Re: pcp updates: dynamic metrics, pmdapapi, containers, qa, Frank Ch. Eigler <=