Hi -
> > As noted before, there was nothing unusual about my procfs
> > configuration. Your existing cgroups-root-001 tarball should show the
> > exact same problems. It was the test code that has been deficient
> > (not doing enough operations to hit the fd-leak/exhaustion limits, for
> > example), not the data.
>
> *sigh*, no - I'm looking to expand the testing coverage and you clearly
> had many more cgroups setup than I did & on a more recent kernel version
> where we have no test coverage yet. [...]
I looked into this further. As promised, the problem is reproducible
with the existing cgroups-root-001 tarball, but this is made more
difficult by the test case's construction. This test uses the .so
pmda variant & pminfo -L/-K runs, so that the test case can force-feed
it the fake /proc data via $PROC_STATSPATH.
In an echo of early problems with the papi-pmda qa, this style makes
leaks difficult to find, because they are so ephemeral: you can't just
do a pminfo loop to exhaust the resources, because they are recreated
anew for each pminfo! If the fake-/proc force-feeding widget used a
mechanism that allowed it to pass through pmcd.conf / pmcd, it would
be more realistic.
Anyway, it is possible to trigger the problem even with the .so pmda
variant, just clumsier. Behold pcpfans.git fche/cgroups-test:
diff --git a/qa/730 b/qa/730
index 15432438c03f..27940e67d819 100755
--- a/qa/730
+++ b/qa/730
@@ -21,6 +21,9 @@ trap "cd $here; rm -rf $tmp.*; exit \$status" 0 1 2 3 15
# real QA test starts here
root=$tmp.root
+
+# NB: .so-style pmda used here so the $PROC_STATSPATH fake /proc can
+# be used. This makes it more difficult to catch leaks in the PMDA.
export PROC_STATSPATH=$root
pmda=$PCP_PMDAS_DIR/proc/pmda_proc.so,proc_init
@@ -32,10 +35,26 @@ do
tar xzf $tgz
base=`basename $tgz`
+ # Assemble a command line sufficient to trigger the fd leak fixed
+ # in commit #680015162. If the .so-style PMDA were not used, then
+ # an ordinary pminfo -f loop could do it.
+ manycgroups=""
+ for a in 1 2 3 4 5 6 7 8 9 10; do
+ for b in 1 2 3 4 5 6 7 8 9 10; do
+ for c in 1 2 3 4 5 6 7 8 9 10; do
+ for d in 1 2; do
+ manycgroups="$manycgroups cgroup"
+ done
+ done
+ done
+ done
+
echo "== Checking namespace and metric numbering - $base"
pminfo -L -K clear -K add,3,$pmda cgroup
echo "== Checking metric descriptors and values - $base"
pminfo -L -K clear -K add,3,$pmda -d -f cgroup
+ echo "== Checking many values - $base"
+ pminfo -L -K clear -K add,3,$pmda -f $manycgroups | sort | uniq -c
echo "== Checking on an individual metric fetch - $base"
pminfo -L -K clear -K add,3,$pmda -f cgroup.blkio.dev.time
echo && echo "== done" && echo
and indeed with the pre-fd-leak-fix code, one gets ...
58779 No value(s) available!
2 pmNameIndom: indom=3.21 inst=2: Unknown or illegal instance identifier
7 pmNameIndom: indom=3.24 inst=2: Unknown or illegal instance identifier
instead of the correct ...
4000 No value(s) available!
- FChE
|