pcp
[Top] [All Lists]

Re: pcp updates: pmdaproc, cgroups, books

To: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Subject: Re: pcp updates: pmdaproc, cgroups, books
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Tue, 18 Nov 2014 23:26:01 -0500 (EST)
Cc: pcp <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <y0mvbmc8kg9.fsf@xxxxxxxx>
References: <1309338393.770280.1416292315684.JavaMail.zimbra@xxxxxxxxxx> <1468616216.771853.1416292522829.JavaMail.zimbra@xxxxxxxxxx> <y0mvbmc8kg9.fsf@xxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: CZ1Qm8WTVaHUaq9ySf5zgQwnPGJr1Q==
Thread-topic: pcp updates: pmdaproc, cgroups, books

----- Original Message -----
> > [...]
> > Nathan Scott (2):
> >       pmda proc: rework existing per-cgroup metrics, and add new ones
> > [...]
> 
> Nice work!  Tried it out; some observations:
> 

Thanks.

>      [...] And indeed lsof shows many open files, in this case

Yep, nasty - good find.

> #3.  From looking more at strace and the code, it looks as though the pmda
>      might be doing too much work per unit fetch.
> [...]
> #4.  Instance domain unawareness.  A variant of #3, when a pcp client is
> [...]
> #5.  Multiple metrics unawareness (so to speak).  A variant of #3,

There's subtleties to the protocol you may not be following here
and thats misleading you in interpreting the observed results, I
think.  The instance PDUs are a separate entity to fetch PDUs,
but the PMDA needs to take similar refresh actions to respond to
each.

> (Some of these may be cases of false confidence based on testing
> primarily against the pcpqa artificial system files as in

Nah, at first glance they mostly seem to be misunderstandings about
the PCP protocol, as issued by pminfo/pmval.

> linux/cgroups-root*.tgz.  It's clever & useful, but cannot be as
> thorough.)

Oh, it can be far more thorough than we could ever hope to be with
testing by-hand; if people contribute to the testsuite and build it
up over time with more cgroup scenarios, more output from different
kernel versions, different platforms, etc, etc ... there is no way
testing by-hand will ever beat that kind of coverage.

Almost anything one can test by-hand can be scripted, and in most
cases any by-hand testing is throw-away effort that could have been
(& should have been) contributed to the testsuite.

In this particular case, everything is reproducible and readily QA-
automated.  I imagine you simply have many more active cgroups than
I did on my dev box when I created cgroups-root-001.tgz, and hence
I don't trip that thar fd-limit.  Could you tar up the contents of
/proc/{cgroups,mounts,diskstats,stat} and /sys/fs/cgroup from this
machine (lets call it cgroups-root-002.tgz), and send it through to
me please?

thanks.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>