pcp
[Top] [All Lists]

Re: pcp updates: pmdaproc, cgroups, books

To: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Subject: Re: pcp updates: pmdaproc, cgroups, books
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Mon, 15 Dec 2014 01:36:03 -0500 (EST)
Cc: pcp developers <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <20141212164033.GD14953@xxxxxxxxxx>
References: <1309338393.770280.1416292315684.JavaMail.zimbra@xxxxxxxxxx> <1249426361.1589601.1416371161222.JavaMail.zimbra@xxxxxxxxxx> <20141119181439.GF5700@xxxxxxxxxx> <1666386574.2247920.1416427865663.JavaMail.zimbra@xxxxxxxxxx> <2132304544.16180073.1418360958577.JavaMail.zimbra@xxxxxxxxxx> <20141212061823.GC14953@xxxxxxxxxx> <53646500.16198226.1418365398316.JavaMail.zimbra@xxxxxxxxxx> <20141212164033.GD14953@xxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: v5lxrIW+Bi420twTOc913VqdrLfjDw==
Thread-topic: pcp updates: pmdaproc, cgroups, books

----- Original Message -----
> > > As noted before, there was nothing unusual about my procfs
> > > configuration.  Your existing cgroups-root-001 tarball should show the
> > > exact same problems.  It was the test code that has been deficient
> > > (not doing enough operations to hit the fd-leak/exhaustion limits, for
> > > example), not the data.
> > 
> > *sigh*, no - I'm looking to expand the testing coverage and you clearly
> > had many more cgroups setup than I did & on a more recent kernel version
> > where we have no test coverage yet.   [...]
> 
> I looked into this further.  As promised, the problem is reproducible
> with the existing cgroups-root-001 tarball, 

That's great - thanks for looking into it & extending the test.

Setting that aside briefly, coverage would still be enhanced with the tgz
I'm asking for.  Relative to the single cgroups-001.tgz case I made, it'd
give coverage for a different kernel version (the contents of some of the
cgroups files differs in more recent kernels).

> [...] but this is made more
> difficult by the test case's construction.  This test uses the .so
> pmda variant & pminfo -L/-K runs, so that the test case can force-feed
> it the fake /proc data via $PROC_STATSPATH.
> 
> In an echo of early problems with the papi-pmda qa, this style makes
> leaks difficult to find, because they are so ephemeral: you can't just
> do a pminfo loop to exhaust the resources, because they are recreated
> anew for each pminfo!

The problem is the choice of client tool, not an inherent limitation of
the testing as you're suggesting.  The case you're reproducing here is
calling for a long-running client that issues many fetches, but you're
coming at it with pminfo - try pmval instead?

> Anyway, it is possible to trigger the problem even with the .so pmda
> variant, just clumsier.  Behold pcpfans.git fche/cgroups-test:

Yep, that's very awkward, and I think your rationale (as in the test
comments added there) is not quite right.  It could be implemented
more cleanly via pmval, without relying on the batch-fetch logic in
pminfo as I think this change does.

It'd be excellent to get this long-running client case going with use
of valgrind too (i.e. qa/731, not just the qa/730 modified here) --
pmval with a short sampling interval (a millisecond or two) would do
the trick I think and it'd be a good addition to both scripts.  More
readable too, for the next person working on these tests.

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>