pcp
[Top] [All Lists]

Re: [pcp] SSL/TLS and IPv6 for PCP via NSS/NSPR

To: pcp@xxxxxxxxxxx
Subject: Re: [pcp] SSL/TLS and IPv6 for PCP via NSS/NSPR
From: Dave Brolley <brolley@xxxxxxxxxx>
Date: Mon, 25 Jun 2012 12:08:13 -0400
In-reply-to: <1340349342.32180.25.camel@xxxxxxxxxxxxxxxxxxxxxxx>
References: <4FE1FE5B.6020302@xxxxxxxxxx> <1340349342.32180.25.camel@xxxxxxxxxxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1


On 06/22/2012 03:15 AM, Ken McDonell wrote:
Welcome ... I spent 3 winters in Alberta, and our twin daughters were born in Edmonton, so I have a long-standing association with Canada. And your stint at Red Hat commenced just a few years after we started work on PCP ... 8^)>
3 Winters in Alberta. You have instantly earned my respect as one tough dude! :-)
As Max has suggested, I think it would have been good to have this discussion a little earlier in terms of the overall approach and trade-offs.
I'll accept the blame for that. I thought I could get a prototype working before presenting it, however, that is taking longer than I had anticipated and Frank (fche) finally convinced me to show you what I had.
I think we're in good shape for the client<-->  pmcd communication, as
there is already an initial handshake where we exchange PDU version and
other info.  For the other communication types (much more on this
below), we don't have a comparable hook.
That's good to know. From the discussion so far, it appears that secure connections between pmcd and the agents are less of a priority, so we can tackle that protocol separately at a later time, if needed.

...
The biggest issue I have run across is the
assumption that a file descriptor is an integer. This assumption is
pervasive throughout PCP and is manifested in the following ways:

- returning integer return codes on error and a file descriptor upon
success from many functions
- indexing arrays using file descriptors
- assuming that file numbers 0, 1 and 2 are stdin, stdout and stderr
respectively.
- printing file descriptors as integers
- keeping track of the "max" file descriptor
    - for the purpose of calls to 'select'
    - for the purpose of iteration

This has resulted in some abstractions designed to remove this
assumption from the mainline code. For example __pmUpdateMaxFD

A second issue is that many components use sockets, files and pipes
seamlessly and that file descriptors representing all of these are
passed to some functions. As a result, these functions must use the
abstracted data types and I have had to extend the POSIX-NSS/NSPR
abstraction to normal file and pipe I/O in many cases. I have, so far,
left as much code as I can unabstracted, however, for the sake of
consistency, it may make sense to convert all I/O (at least within the
pmcd) to use the abstraction layer.
As you'll see from my comments below, I think we have a problem here ...
and I believe there may be a better solution.
[ ... ]
I started to review all of the changes [it is long, which is why some of
the other email replies have overtaken my thoughts], and this is pretty
much a transcript of that review up to the point where I decided to stop
(after all the libraries).


src/include/pcp/pmio.h

         PM_ERROR_<foo>  are special negative error codes ... probably
         need a different name for this one ... PM_FD_NULL?
         +#define PM_ERROR_FD NULL

         And I think there may be a knock-on problem (that you've
         identified) and in many places we are using the "fd is an int
         type" to return a PM_ERROR_<foo>  value if this maps into NULL,
         we've lost the details of the _cause_ for the error, e.g. in
         auxconnect.c the new code
                 return PM_ERROR_FD;
     is not semantically the same as the old code
        return -neterror();

     Variants of this occur in lots of places.

         I suspect the following fd uses may have been caught up in these
         changes
         - client<-->  pmcd : these are OK and the focus of the changes
         - client<-->  archive : these should probably not be changed
         - pmlc<-->  pmlogger : we have not talked about this
         specificially, but I suspect this needs the same treatment as
         client<-->  pmcd
         - recording client<-->  pmlogger : these should probably not be
         changed
         - client<-->  pmtime : these should probably not be changed
         - client<-->  pmproxy<-->  pmcd : grunt ... probably should be
         handled but I have no idea how the "chaining" through pmproxy
         might work

         My guess is that some of these use PDU routines and that's the
         point where the NSS changes leaked out.   Keeping int fd's as
         indexes into our own array if IPC/socket control structures (as
         outlined below) might allow all of these variants to co-exist
         without changing the IPC channels that are not intended for NSS
         handling.

         Probably need some discussion on this one.  If we keep the "fd
         is an int type" semantics, but make fd an index into an array of
         socket abstraction structures ... this might work because all of
         the actual open/read/write/close calls have already been mapped
         onto new __pm<op>  routines, e.g.  __pmCloseSocket(),
         __pmRead(), ... or could be handled in __pmXmitPDU() and
         __pmRecvPDU().

         In summary, I would strongly advocate reverting all the __pmFD
         changes back to int, and then seeing if we can localize the
         changes needed to just those places where the I/O stream may be
         using NSS.

         This would _dramatically_ reduce the scope and size of the
         changes, e.g. _none_ of the p_*.c routines in libpcp would need
         to be changed.
I welcome this suggestion. The whole int <--> FD issue has been the biggest obstacle and, as you have noticed, a lot of changes leaked out to places where there are really not needed due to the use of common interfaces like the PDU functions.

src/libpcp/src/auxconnect.c
__pmAllocHostEntBuffer (void)

         We try _really_ hard to ensure all *alloc() calls succeed or are
         trapped asap ... in most cases recovery is not possible.

         Here, you need to check for malloc failure, and probably call
         pmNoMem(..., PM_FATAL_ERR) to emit a message and exit.

         This way all the callers of __pmAllocHostEntBuffer() don't need
         to check, as success == return.
ok

         Minor nit also, the prevailing PCP coding style would not have a
         space after the function name in the prototype (here and in the
         header file).
Yes. I have been trying to observe the coding style as it exists, but some old habits are hard to break.

src/include/pcp/impl.h

         Many (all?) of the FILE * ->  __pmFD changes look suspect.  All
         of the routines/data elements below relate to PCP archives which
         should be outside the scope of the NSS changes I suspect.  This
         may be another variant of the int ->  __pmFD change discussed
         above.
Some of the leaked changes may have indeed overstepped the actual need in terms of being necessary for compilation. I was beginning to believe that *all* the I/O in PCP would need to be converted and may have done so in a few cases.
         Not sure what changing t_who from int to pmcdWho is about (but
         suspect it is more fallout from __pmFD) ... PMCD tracing is only
         for diagnostic use.
Yes, some tracing calls were passing file descriptors and others were various integer status codes.

src/libpcp_pmda/src/help.c

     Only change here is a Red Hat copyright assertion?
There must have been a change here that was reverted.

And that's as far as I got ...
I appreciate the quick review. I am very interested in the idea of a FD as an index into a table of actual file descriptors and, I believe that this will solve most of the issues you have raised. Is the IPC table what you had in mind, or are thinking of a separate table?

Thanks,
Dave

<Prev in Thread] Current Thread [Next in Thread>