pcp
[Top] [All Lists]

Re: [pcp] NSS/NSPR Testing Status

To: Dave Brolley <brolley@xxxxxxxxxx>, Ken McDonell <kenj@xxxxxxxxxxxxxxxx>, "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Subject: Re: [pcp] NSS/NSPR Testing Status
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Wed, 28 Nov 2012 20:11:37 -0500 (EST)
Cc: pcp@xxxxxxxxxxx
In-reply-to: <50B62698.4080501@xxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Hi Dave,

> -------- Original Message --------
> ...
> As of commit e110b05e44b0678d9ba632ab5896ccc7e50c3cf6 (pcpfans
> brolley/nssmerge), I am down to two differences in the qa test
> results when running with NSS/NSPR enabled as opposed to without.
> 

(great to hear!)

> They are:
> 
> 387 -- pmnsunload: memory leak in NSPR found by valgrind
> 
> As we have discussed, this is memory being allocated by PR_Socket
> (from
> within __pmCreateSocket). It is memory within NSPR's file descriptor
> cache which is used by NSPR to recycle old file descriptors. This
> would
> likely be solved by a call to PR_Cleanup, but the question is,  from
> where? We call PR_Init from within __pmCreateSocket, as needed
> (modern
> versions of NSPR are self initializing on the first API call). Any
> program which calls __pmCreateSocket would need to eventually call
> PR_Cleanup before exiting in order to be leak-free, according to
> valgrind. It may be a bit pedantic to explicitly clean up just before
> exiting, but we could provide a __pmCleanup function which programs
> could call before exiting. With NSS/NSPR enabled, it would call
> PR_Cleanup and it would do nothing otherwise. I supposed there is the
> possibility that libpcp could need a cleanup function for
> non-NSS/NSPR
> purposes someday.

I think its worth considering doing this cleanup.  While we could
guard against it in the QA test (adding valgrind filters), which
is fine for us, its not really "nice" for someone using the PMAPI
and valgrind outside the PCP tree.

An alternative to a new cleanup routine would be to add (another)
atexit handler in libpcp.  I'm not a huge fan of that, but would
be very interested in other opinions.  This mechanism is how the
local context endCallBack hook is implemented (connectlocal.c).

There's only a limited number of guaranteed atexit slots, so we
shouldn't really go eating them all up without the caller knowing
about it.  Maybe a single atexit handler which handles all libpcp's
atexit needs (and this could call the new __pmCleanup/__pmShutdown/
... routine, which could be exposed so clients can opt-in as well).

Thread-safety is all interweaved in this so be good to garner Kens
thoughts on this one.  Also, I think only on Linux does the atexit
handler run during shared library unload, so MacOSX/Solaris/... may
be exposed to subtle bugs when libpcp is explicitly loaded/unloaded.
An explicit shutdown interface could provide opportunity to avoid
that.

> 578 -- test harness expects monotonically increasing fd numbers
> 
> The openfds metric of pmcd is a bit misleading. It implies the number
> of
> currently open file descriptors, but it is reported based on the
> value
> of pmcd_hi_openfds (src/pmcd/src/pmcd.h) which is actually a high
> water
> mark of the highest ever open fd number. It is perhaps coincidentally
> true that native fd numbers are monotonically increasing and that,
> after
> installation, the number of open file descriptors is also the number
> of
> the highest fd number as the test expects. However, with NSS/NSPR
> enabled, while the native fd numbers are still in the range 0-1023,
> the
> NSPR ones are in the range 1024-2047 and so this assumption is no
> longer
> true.
> 
> I'm thinking that we should fix the metric to actually count the
> number
> of open fds (in the IPC table?). Can you see another solution?

Not really, as mentioned this metric is defined as follows:

$ pminfo -Tt pmcd.openfds

pmcd.openfds [highest PMCD file descriptor]
Help:
The highest file descriptor index used by PMCD for a Client or PMDA
connection.

So its name is a little misleading.  I liked Franks suggestion though,
which was:

| Counting actual open fd's is not that hard even in general POSIX -
| iterate over the valid fd range, calling dup(i); if it returns >=0,
| close it and increment the openfds counter.  With the IPC table, it
| sounds even easier and sounds like the way to go.

Its more expensive during the fetch method, but more accurate.
I think Ken may have written this code originally though, so be
good to again seek his thoughts before reimplementing it.

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>