On Wed, 2012-06-20 at 12:46 -0400, Dave Brolley wrote:
> Since this is my first post to this list, let me first introduce myself.
> My name is Dave Brolley and I am a software engineer at Red Hat in
> Toronto, Canada. I've been at Red Hat for almost 15 years with the past
> 5 years or so spent working on the RHEL/Fedora tools team with my main
> focus being on Systemtap.
Welcome ... I spent 3 winters in Alberta, and our twin daughters were
born in Edmonton, so I have a long-standing association with Canada. And
your stint at Red Hat commenced just a few years after we started work
on PCP ... 8^)>
> I have been given the assignment of adding SSL/TLS capability to the
> various components of PCP. This would give clients and agents the option
> of using a secure connection with the pmcd if desired. Having started
> the work, I wanted to present my approach for feedback, to make sure
> that the approach is sound and that the work will stand some chance of
> being accepted into the project.
I know next to nothing about SSL/TLS, but fortuunatelyt I know a lot
more about PCP, and my comments below reflect that.
> All of the work described below can be found on the brolley/nss branch
> in our pcp repository at git://sourceware.org/git/pcpfans.git
> The Approach:
As Max has suggested, I think it would have been good to have this
discussion a little earlier in terms of the overall approach and
> The use of NSS/NSPR on its own does not provide SSL/TLS connections. The
> client and server must both agree to and expect the use of a secure
> connection. The idea would be to have an insecure connection made as
> normal and then to provide some sort of protocol by which the client
> and/or server can request an upgrade to a secure connection (think
> STARTTLS). In this way existing pmcd clients and agents can still
> connect to pmcd without change and future clients and agents can still
> use insecure connections if they choose.
I think we're in good shape for the client <--> pmcd communication, as
there is already an initial handshake where we exchange PDU version and
other info. For the other communication types (much more on this
below), we don't have a comparable hook.
> The biggest issue I have run across is the
> assumption that a file descriptor is an integer. This assumption is
> pervasive throughout PCP and is manifested in the following ways:
> - returning integer return codes on error and a file descriptor upon
> success from many functions
> - indexing arrays using file descriptors
> - assuming that file numbers 0, 1 and 2 are stdin, stdout and stderr
> - printing file descriptors as integers
> - keeping track of the "max" file descriptor
> - for the purpose of calls to 'select'
> - for the purpose of iteration
> This has resulted in some abstractions designed to remove this
> assumption from the mainline code. For example __pmUpdateMaxFD
> A second issue is that many components use sockets, files and pipes
> seamlessly and that file descriptors representing all of these are
> passed to some functions. As a result, these functions must use the
> abstracted data types and I have had to extend the POSIX-NSS/NSPR
> abstraction to normal file and pipe I/O in many cases. I have, so far,
> left as much code as I can unabstracted, however, for the sake of
> consistency, it may make sense to convert all I/O (at least within the
> pmcd) to use the abstraction layer.
As you'll see from my comments below, I think we have a problem here ...
and I believe there may be a better solution.
> the next area of focus. I understand that there is a test harness at
> I will likely need help in getting things set up on my available
> platforms and will certainly need assistance in testing on platforms
> that are not available to me.
Yep. As Nathan has said, yell when you're ready.
I started to review all of the changes [it is long, which is why some of
the other email replies have overtaken my thoughts], and this is pretty
much a transcript of that review up to the point where I decided to stop
(after all the libraries).
PM_ERROR_<foo> are special negative error codes ... probably
need a different name for this one ... PM_FD_NULL?
+#define PM_ERROR_FD NULL
And I think there may be a knock-on problem (that you've
identified) and in many places we are using the "fd is an int
type" to return a PM_ERROR_<foo> value if this maps into NULL,
we've lost the details of the _cause_ for the error, e.g. in
auxconnect.c the new code
is not semantically the same as the old code
Variants of this occur in lots of places.
I suspect the following fd uses may have been caught up in these
- client <--> pmcd : these are OK and the focus of the changes
- client <--> archive : these should probably not be changed
- pmlc <--> pmlogger : we have not talked about this
specificially, but I suspect this needs the same treatment as
client <--> pmcd
- recording client <--> pmlogger : these should probably not be
- client <--> pmtime : these should probably not be changed
- client <--> pmproxy <--> pmcd : grunt ... probably should be
handled but I have no idea how the "chaining" through pmproxy
My guess is that some of these use PDU routines and that's the
point where the NSS changes leaked out. Keeping int fd's as
indexes into our own array if IPC/socket control structures (as
outlined below) might allow all of these variants to co-exist
without changing the IPC channels that are not intended for NSS
Probably need some discussion on this one. If we keep the "fd
is an int type" semantics, but make fd an index into an array of
socket abstraction structures ... this might work because all of
the actual open/read/write/close calls have already been mapped
onto new __pm<op> routines, e.g. __pmCloseSocket(),
__pmRead(), ... or could be handled in __pmXmitPDU() and
In summary, I would strongly advocate reverting all the __pmFD
changes back to int, and then seeing if we can localize the
changes needed to just those places where the I/O stream may be
This would _dramatically_ reduce the scope and size of the
changes, e.g. _none_ of the p_*.c routines in libpcp would need
to be changed.
We try _really_ hard to ensure all *alloc() calls succeed or are
trapped asap ... in most cases recovery is not possible.
Here, you need to check for malloc failure, and probably call
pmNoMem(..., PM_FATAL_ERR) to emit a message and exit.
This way all the callers of __pmAllocHostEntBuffer() don't need
to check, as success == return.
Minor nit also, the prevailing PCP coding style would not have a
space after the function name in the prototype (here and in the
Many (all?) of the FILE * -> __pmFD changes look suspect. All
of the routines/data elements below relate to PCP archives which
should be outside the scope of the NSS changes I suspect. This
may be another variant of the int -> __pmFD change discussed
- FILE *l_tifp; /* temporal index */
- FILE *l_mdfp; /* meta data */
- FILE *l_mfp; /* current metrics log */
+ __pmFD l_tifp; /* temporal index */
+ __pmFD l_mdfp; /* meta data */
+ __pmFD l_mfp; /* current metrics log */
-extern FILE *__pmLogNewFile(const char *, int);
+extern __pmFD __pmLogNewFile(const char *, int);
-extern int __pmLogRead(__pmLogCtl *, int, FILE *, pmResult **, int);
-extern int __pmLogWriteLabel(FILE *, const __pmLogLabel *);
+extern int __pmLogRead(__pmLogCtl *, int, __pmFD, pmResult **, int);
+extern int __pmLogWriteLabel(__pmFD, const __pmLogLabel *);
-extern void __pmLogCacheClear(FILE *);
+extern void __pmLogCacheClear(__pmFD);
-extern int __pmLogChkLabel(__pmLogCtl *, FILE *, __pmLogLabel *, int);
+extern int __pmLogChkLabel(__pmLogCtl *, __pmFD, __pmLogLabel *, int);
-__pmProcessCreate(char **argv, int *infd, int *outfd)
+__pmProcessCreate(char **argv, __pmFD *infd, __pmFD *outfd)
This is another example of the int -> __pmFD mapping that is not
needed (this routine is never used for any IPC that needs to be
protected by NSS, and I doubt that __pmPipe() is needed at all.
Ditto for this source file ... all the I/O here is between a
client that has started a recording session with pmlogger and
the pmlogger process itself. Both processes are running on the
same machine, so no NSS role here I believe.
Simlarly ... all the I/O here is between a client and the
pmtime(1) controller. Both processes are running on the same
machine, so no NSS role here I believe.
This time the I/O is between the trace PMDA and a client using
libpcp_trace. Both processes are running on the same machine,
so no NSS role here I believe.
Another innocent bystander caught up in the global int -> __pmFD
Ditto. Only local PCP archive I/O happening in these places.
Ditto again. The only I/O happening here is between PMCD and
PMDAs which does not need NSS.
More __pmFD changes here that are probably not needed. The
socket abstraction here looks fine.
Not sure what changing t_who from int to pmcdWho is about (but
suspect it is more fallout from __pmFD) ... PMCD tracing is only
for diagnostic use.
Only change here is a Red Hat copyright assertion?
And that's as far as I got ...