pcp
[Top] [All Lists]

Re: pcp updates: pmdaproc, cgroups, books

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: pcp updates: pmdaproc, cgroups, books
From: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Date: Fri, 12 Dec 2014 11:40:33 -0500
Cc: pcp developers <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <53646500.16198226.1418365398316.JavaMail.zimbra@xxxxxxxxxx>
References: <1309338393.770280.1416292315684.JavaMail.zimbra@xxxxxxxxxx> <1468616216.771853.1416292522829.JavaMail.zimbra@xxxxxxxxxx> <y0mvbmc8kg9.fsf@xxxxxxxx> <1249426361.1589601.1416371161222.JavaMail.zimbra@xxxxxxxxxx> <20141119181439.GF5700@xxxxxxxxxx> <1666386574.2247920.1416427865663.JavaMail.zimbra@xxxxxxxxxx> <2132304544.16180073.1418360958577.JavaMail.zimbra@xxxxxxxxxx> <20141212061823.GC14953@xxxxxxxxxx> <53646500.16198226.1418365398316.JavaMail.zimbra@xxxxxxxxxx>
User-agent: Mutt/1.4.2.2i
Hi -

> > As noted before, there was nothing unusual about my procfs
> > configuration.  Your existing cgroups-root-001 tarball should show the
> > exact same problems.  It was the test code that has been deficient
> > (not doing enough operations to hit the fd-leak/exhaustion limits, for
> > example), not the data.
> 
> *sigh*, no - I'm looking to expand the testing coverage and you clearly
> had many more cgroups setup than I did & on a more recent kernel version
> where we have no test coverage yet.   [...]

I looked into this further.  As promised, the problem is reproducible
with the existing cgroups-root-001 tarball, but this is made more
difficult by the test case's construction.  This test uses the .so
pmda variant & pminfo -L/-K runs, so that the test case can force-feed
it the fake /proc data via $PROC_STATSPATH.

In an echo of early problems with the papi-pmda qa, this style makes
leaks difficult to find, because they are so ephemeral: you can't just
do a pminfo loop to exhaust the resources, because they are recreated
anew for each pminfo!  If the fake-/proc force-feeding widget used a
mechanism that allowed it to pass through pmcd.conf / pmcd, it would
be more realistic.

Anyway, it is possible to trigger the problem even with the .so pmda
variant, just clumsier.  Behold pcpfans.git fche/cgroups-test:


diff --git a/qa/730 b/qa/730
index 15432438c03f..27940e67d819 100755
--- a/qa/730
+++ b/qa/730
@@ -21,6 +21,9 @@ trap "cd $here; rm -rf $tmp.*; exit \$status" 0 1 2 3 15
 
 # real QA test starts here
 root=$tmp.root
+
+# NB: .so-style pmda used here so the $PROC_STATSPATH fake /proc can
+# be used.  This makes it more difficult to catch leaks in the PMDA.
 export PROC_STATSPATH=$root
 pmda=$PCP_PMDAS_DIR/proc/pmda_proc.so,proc_init
 
@@ -32,10 +35,26 @@ do
     tar xzf $tgz
     base=`basename $tgz`
 
+    # Assemble a command line sufficient to trigger the fd leak fixed
+    # in commit #680015162.  If the .so-style PMDA were not used, then
+    # an ordinary pminfo -f loop could do it.
+    manycgroups=""
+    for a in 1 2 3 4 5 6 7 8 9 10; do
+        for b in 1 2 3 4 5 6 7 8 9 10; do
+            for c in 1 2 3 4 5 6 7 8 9 10; do
+                for d in 1 2; do
+                    manycgroups="$manycgroups cgroup"
+                done
+            done
+        done
+    done
+
     echo "== Checking namespace and metric numbering - $base"
     pminfo -L -K clear -K add,3,$pmda cgroup
     echo "== Checking metric descriptors and values - $base"
     pminfo -L -K clear -K add,3,$pmda -d -f cgroup
+    echo "== Checking many values - $base"
+    pminfo -L -K clear -K add,3,$pmda -f $manycgroups | sort | uniq -c
     echo "== Checking on an individual metric fetch - $base"
     pminfo -L -K clear -K add,3,$pmda -f cgroup.blkio.dev.time
     echo && echo "== done" && echo


and indeed with the pre-fd-leak-fix code, one gets ...

  58779 No value(s) available!
      2 pmNameIndom: indom=3.21 inst=2: Unknown or illegal instance identifier
      7 pmNameIndom: indom=3.24 inst=2: Unknown or illegal instance identifier

instead of the correct ...

   4000 No value(s) available!


- FChE

<Prev in Thread] Current Thread [Next in Thread>